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THE CHOICE OF A RESPONSE METAMETER IN BIO-ASSAY 
D. J. Finney 


Lecturer in the Design and Analysis of Scientific Experiment, 
University of Oxford 


INTRODUCTION 


LESS THERE ARE theoretical objections or empirical indications to 

the contrary, the statistician usually assumes that quantitative data 
which are put before him are normally distributed. If he is obliged to 
abandon the hypothesis of normality, he may choose some alternative 
specification of the distribution, or he may seek a transformation of the 
data (1) in terms of which the distribution is normal. The alternatives 
are closely related, for adoption of a transformation implies some 
assumption about the form of the original distribution and is an ana- 
lytical convenience rather than an essentially different approach. Unless 
the data are very extensive, they are unlikely to discriminate satis- 
factorily between two or more equally plausible normalizing transforma- 
tions (such as the square root and the logarithm); it is therefore natural 
to inquire how far conclusions drawn from a statistical analysis may 
be affected by the choice that is made. If the distribution is really 
normal in terms of one transformation, it cannot be normal for another 
unless the second is itself a linear transformation of the first, and in 
general numerical results obtained by the use of different transforma- 
tions will not be identical. 

Similar inquiries might be made about other common assumptions, 
such as those of linearity and homoscedasticity of regressions, often 
introduced by the statistician in order to form a mathematical model 
(on the basis of which the data may be subjected to statistical analysis) 
even though no biological theory demands that particular model. Few 
would deny that these may represent reasonable approximations to 
reality, none would assert their exact and absolute truth. Must the 
fact that different statisticians would not always agree on the choice 
of a normalizing or linearizing transformation for a particular body of 
data be regarded as a flaw in the vaunted objectivity of statistical 
analysis? 
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ASSUMPTIONS IN BIO-ASSAY 


During the discussions that led to another paper (6), the relevance 
of this question to biological assay became apparent. The statistical 
analysis of the commonest type of dilution assay, that based on a linear 
regression of response on the logarithm of dose,* involves assumptions 
that, in terms of the dose and response metameters adopted for the 
analysis: 

(a) The distribution of responses for any fixed dose is normal. 

(b) The variance of responses for a fixed dose is independent of the 

dose. 

(c) The mean of the response distribution is linearly related to the 
logarithm of the dose, each preparation tested having its own 
line. 

(d) The line relating mean response and log dose for any test 
preparation is parallel to that for the standard. 

These have been discussed in detail in (6), where they are listed as B5, 
B6, B4, A3 respectively. The first three relate to the mathematical 
model implicit in the form of statistical analysis usually made. The 
fourth is essential to the logic of a dilution assay, for which the test 
preparation must behave as though it were a dilution of the standard 
preparation: unless the regression lines (or curves, if the regressions 
are not linear) are parallel, no assay of the test preparation in terms of 
the standard is conceivable (3,4). None of (a), (b), (c) is ever known 
to be true a priori, though experience of a particular type of assay may 
establish a strong presumption that they are sufficiently near the truth 
for practical purposes. Nevertheless, some check on the basic assump- 
tions should be available from the internal evidence of an assay. Serious 
disagreement between the data and (d) must be regarded as evidence 
that the assay is fundamentally invalid, whereas significant deviation 
from (a), (b), or (c) may be only an indication of invalidity of the 
particular statistical analysis adopted and may perhaps be remedied 
by use of other metameter transformations. Choice between the 
alternative explanations and remedies cannot be made entirely on the 
evidence of one assay, and must to some extent depend upon the experi- 
menter’s (or the statistician’s) experience of previous similar assays. 


DATA FOR STUDY 


Dr. Jerne pointed out to the writer that his analysis of a prolactin 
assay, used as an illustrative example elsewhere (5), might be misleading. 


*Valid assay systems can be constructed on other foundations, (4), and even violently different 
schemes may sometimes be required (9); the system described here is of wide applicability and provides 
adequate illustration of the theme. 
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The final statement of a potency estimate and fiducial limits was based 
not only on the data but also on a series of assumptions not there men- 
tioned, of which (a), (b) and (c) above were the chief, and in fact these 
assumptions were of doubtful validity in the example. A charge that 
assumptions affecting the conclusions have not been explicitly stated 
might be levelled against most statistical analyses which make use of 
normal or other standard distributions, for these are generally considered 
too obvious to need statement on every occasion. Criticism of the 
analysis of the prolactin assay, however, was undoubtedly justifiable, 
as inspection of Table 1 will confirm: the two very high responses 
suggest a positive skewness, and a variance ratio test shows the variance 
to be significantly higher for the high dose than for the low. Never- 
theless, the analysis had not been performed quite unthinkingly. No 
adequate test of normality can be made on the small number of ob- 
servations used for an assay, but general statistical theory teaches that 
means of several independent observations, under very wide conditions, 
have distributions more nearly normal than the distributions of indi- 
viduals: in forming an estimate of relative potency, it is the means for 
different dose-groups rather than individual responses that are im- 
portant. Again, experience has indicated that modification of the 
statistical analysis of an assay in order to allow for the inconstancy of 
the variance of responses adds much to the labour but does not make 
any appreciable difference to the conclusions unless the variation in 
variance is very great. 


TABLE 1 
DATA FOR AN ASSAY OF PROLACTIN 


The responses are the crop-gland weights of pigeons (in 0.1g.) 


Dose of standard Dose of test 
preparation (i.u.) preparation (mg.) 
1.25 2.50 5.00 0.125 0.250 0.500 
38 53 85 28 48 60 
39 102 144 65 47 130 
48 81 54 35 54 83 
62 75 85 36 74 60 
Totals. 187 311 368 164 223 333 


As a second example, for which the skewness is less apparent, but 
which also shows indications of heteroscedasticity, data from an assay of 
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testosterone propionate by Pugsley (8) will be used. The design of the 
assay was similar, and the results are shown in Table 2. 


TABLE 2 
DATA FOR AN ASSAY OF TESTOSTERONE PROPIONATE 


The responses are measures of comb-growth of capons (in mm.) 


Dose of standard Dose of test 
preparation (y) preparation (y) 
20 40 80 20 40 80 
6 12 19 6 12 16 
6 ll 14 6 ll 18 
5 12 14 6 12 19 
6 10 15 -f 12 16 
7 7 14 4 10 15 
Totals 30 52 76 29 57 84 


As a study of the extent to which the conclusions from such assays 
might be altered by the use of a response metameter different from that 
directly measured, each set of data was analysed for a series of meta- 
meters 

y* = y’; 


where « was given a series of values between +3 and —3'. For any 
value of t, an estimate of variance within doses (with 18 degrees of 
freedom and 24 degrees of freedom respectively, for the two assays) 
was obtained, calculations were made of various quantities to be used 
in tests of deviations from the basic assumptions (a)—(d), and finally 
an estimate of relative potency, with its lower and upper five per cent 
fiducial limits, was formed. The method of analysis has been fully 
described, for + = 1, in Section 2 of (5). The transformation 7 = 0 is 
equivalent to y* = log y, and, in this paper, t = 0 will always refer to 
the logarithmic transformation. 


VALIDITY TESTS 


For neither assay are the data adequate to provide tests of deviations 
from normality. The transformations tried will greatly intensify any 


1The adjustments for variation in initial body-weight discussed in (5) were not used for the pro- 
lactin assay: they would have increased the arithmetical labour without adding useful information. 
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positive skewness when 7 > 1.0, and include extremes of skewness more 
marked than any that would ordinarily be encountered in any practical 
consideration of alternative response metameters. For 7 < 0, a negative 
skewness appears and becomes steadily more pronounced as 7 is de- 
creased. Thus the range of metameters studied includes a region in 
which no skewness is apparent, and that is about the most that may 
be expected as a criterion of normality on so few data. 


TABLE 3 
SUMMARY OF VALIDITY TESTS 
(for explanation, see text) 


Assay of prolactin Assay of testosterone propionate 
Values of ¢ for Values of t for 
Y z Prepara- Linearity Parallel- z Prepara- Linearity Parallel- 
tions ism j tions ism 
3.0. 2.38 0.80 0.54 —0.41 2.91 1.33 2.32 1.34 
2.5 2.01 0.88 0.44 —0.37 2.37 1.40 2.07 1.37 
2.0 1.64 0.99 0.30 —0.32 1.83 1.46 1.64 1.38 
1.5 1.28 1.11 0.12 —0.24 1.29 1.48 0.98 1.36 
1.0 0.93 1.24 —0.10 -—0.12 0.74 1.42 0.08 1.31 
0.5 0.58 1.38 —0.35 0.03 0.18 1.26 —0.94 1.21 
0.0 0.24 1.50 —0.63 0.23 —0.38 0.99 —1.89 1.08 
-0.5 —0.09 1.60 -—0.91 0.46 —0.95 0.67 —2.55 0.95 
-1.0 -0.41 1.67 -1.15 0.70 —-1.53 0.36 —2.86 0.85 
-1.5 -—0.73 1.69 —1.32 0.93 —2.11 0.10 —2.90 0.80 
—2.0 —1.05 1.69 —1.47 1.12 —2.70 -0.11 —2.78 0.77 
—-1.37 1.67 -1.54 1.28 —3.29 —0.27 —2.57 0.78 
—3.0 —1.70 1.64 —-1.56 1.41 —3.88 —0.39 —2.34 0.79 
10% 0.56 1.73 0.48 1.71 
5% 0.73 2.10 0.62 2.06 
1% 1.07 2.88 0.90 2.80 


Table 3 summarizes validity tests for the two assays. As a test of 
the independence of variance and response, the error mean square for 
the highest doses of both preparations has been compared with the 
error mean square for the lowest doses in terms of Fisher’s z, with (6,6) 
and (8,8) degrees of freedom respectively for the two assays. Values 
of z for 10%, 5%, and 1% probability levels, on the null hypothesis 
that the true variances at the extremes of dose are equal, are shown at 
the bottom of the columns. For any value of 7 outside the range 
0.7 > « > —1.5 for the prolactin assay, or 0.9 > « > —0.2 for the 
testosterone, the evidence against homoscedasticity would be judged 
significant, and for values of 7 very little more extreme the evidence is 
overwhelming. The further calculations for an assay, in their usual 
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form, are strictly valid only if the regression of response on dose is 
homoscedastic, but experience suggests that the difference between 
extremes of variance has to be very great before neglect of it has ap- 
preciable effect on the estimate of potency and its fiducial limits. The 
validity test just described is more sensitive to changes of metameter 
of the type under discussion than the other tests summarized in Table 3, 
and may often be unduly sensitive. In well-planned symmetrical assays 
of ordinary size, modification of the statistical analysis on account of 
heteroscedasticity is seldom needed unless the evidence against a con- 
stant variance is significant at least at the 1% level. 

In order to avoid troubles arising from observations outside the 
range of linearity of the response curve, corresponding doses of the two 
preparations should be chosen so as to give responses as nearly equal 
as existing information on potency will allow. Failure to achieve 
equality is not itself an indication of invalidity, but is a warning that 
non-linearity may have serious comsequences. Both assays under dis- 
cussion are symmetrical, and a significance test of the difference be- 
tween mean responses to the two preparations is therefore relevant to 
this point. Table 3 shows values of ¢ for such a test, based on pooled 
variance estimates from all doses, under the heading ‘Preparations’. 
For neither assay does ¢ attain significance, even at extreme values of 
t, as is seen by reference to the values for 10%, 5%, and 1% probability 
at the bottom of the tablc. 

The routine analysis of assays of this pattern makes use of measures 
of deviation from linearity (the mean coefficient of a quadratic term in 
the regression equation) and from parallelism (the difference between 
regression coefficients for the two preparations); values of ¢ for these 
are also shown in Table 3, and may also be compared with the entries 
in the last three lines of the table. For the first assay, neither test shows 
invalidity at any value of +. For the second, there are strong indications 
of non-linearity except when 2.5 > ¢ > —0.1. Faced with such results, 
the statistician would see no reason to doubt the fundamental validity 
of either assay, but he would probably feel bound to regard his form 
of analysis for the second assay as invalid if it employed a value of + 
outside this range. It is of interest to note that these validity criteria 
do not always decrease or increase steadily over the whole range of ¢ 
studied. 


ESTIMATES OF POTENCY 
The potency of the test preparation relative to the standard, together 


with its 95 per cent fiducial limits, were calculated on the hypothesis 
that (a)-(d) were true. The limits were obtained by use of Fieller’s 
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theorem on the fiducial limits of a ratio (and not by the common ap- 
proximation from a variance of the log potency), in which the quantity 


t’?V(b) 
g ? 


where b is the regression coefficient of y* on log dose, plays an important 
part (2,4). A value of g that is little less than unity implies that b only 
just attains a magnitude significantly different from zero, and the 
contribution of errors in b to the errors of the potency estimate is there- 
fore large; a small value of g indicates a relatively precise estimation of 
b, errors in which therefore make little contribution to the error of the 
potency estimate. In Table 4, g is tabulated as well as the potency 
estimates. For the prolactin assay, g is always larger than 0.2, and 
reaches a minimum near i = —1.0. For the testosterone assay, g is 
usually small enough to be negligible, and has a minimum near # = 0.5; 
with a large negative value of 7, however, and probably also with a 
positive value larger than those studied, it becomes large. 


TABLE 4 
SUMMARY OF POTENCY ESTIMATES 
(for explanation, see text) 


Assay of prolactin Assay of testosterone propionate 
Potency 5% limits Potency 5% limits 

1 g (iu. per mg.) g (g. per g.) 
3.0 0.71 6.96 0.37-22.5 0.050 1.18 0.91-1.54 
2.5 0.60 6.92 0.85-18.9 0.038 1.16 0.93-1.47 
2.0 0.49 6.88 1.39-16.4 0.030 1.15 0.94-1.41 
1.5 0.40 6.84 1.91-14.6 0.023 1.13 0.95-1.36 
1.0 0.33 6.80 2.36-13.3 0.020 1.12 0.95-1.32 
0.5 0.28 6.76 2.72-12.4 0.019 1.10 0.94-1.29 
0.0 0.24 6.71 2.97-11.8 0.020 1.08  0.92-1.27 
-—0.5 0.22 6.66 3.09-11.4 0.025 1.06 0.89-1.27 
—1.0 0.22 6.58 3.08-11.2 0.034 1.04 0.84-1.28 
-1.5 0.22 6.53 3.01-11.1 0.048 1.01 0.78-1.31 
—2.0 0.24 6.42 2.81-11.2 0.071 0.98  0.72-1.34 
—2.5 0.26 6.32 2.55-11.3 0.104 0.95 0.65-1.39 
—3.0 0.30 6.19 2.22-11.5 0.148 0.92 0.56-1.46 


The object of the assay is the estimation of the potency of the test 
preparation and the assignment of fiducial limits to the true potency. 
Figures 1 and 2 illustrate the manner in which these results are in- 
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FIGURE 1. 
ESTIMATES OF POTENCY AND FIDUCIAL LIMITS FOR PROLACTIN ASSAY. 


fluenced by the choice of 7. The potency estimate is not altered to any 
great extent by small changes in 7. If shown the data from either of 
these assays, it is unlikely that any statistician would contemplate the 
use of a response metameter of the form y‘ with i > 2 or i < 1, and 
over this extreme range the potency estimate changes only by 5%-10%. 
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FIGURE 2. 
ESTIMATES OF POTENCY AND FIDUCIAL LIMITS FOR 
TESTOSTERONE PROPIONATE ASSAY. 


us 
la 
13 
12 | 
1.0 
Potency 
in 
per g. 
Estimate 
09 
08 
i 
07 
} 
06 
Lower Limit ee 
-3.0 -10 10 20 30 


270 BIOMETRICS, DECEMBER 1949 


In the absence of prior evidence to the contrary, his choice would almost 
certainly be made between i = 1 and i = 0, so that his subjective 
judgement would affect the estimate only by 2%-4%. 

The fiducial limits are more sensitive to metameter changes. For 
the prolactin assay, the fiducial range is narrowest at about i = —1.0, 
and is appreciably widened (primarily because of the increase in g) if 
t>0.50r7 < —2.0. For the testosterone assay, the range is narrowest 
at about 7 = 0.0, and begins to widen seriously for 7 > 1.5 ori < —1.0. 
There is, of course, no reason to regard narrowness of fiducial range as 
a criterion for the right choice of a response metameter, especially as 
the method of calculating the limits assumes the truth of conditions 
(a)-(d). If there were an exact correspondence between the mathe- 
matical model and the experimental data, exact probability statements 
could be made about the fiducial limits, but that is an ideal impossible 
of achievement. In practice, all that is required of fiducial limits is 
that, when computed by standard rules, they shall give an indication 
of a range within which the true value almost certainly lies. The user 
of a biological assay will wish to base some course of action on its 
results, but he would be unwise to base critical decisions on whether a 
particular value for the potency is just within or just beyond the 
calculated limits. For most of his questions, the limits will give a clear 
answer, but cases of doubt should be resolved by further experimenta- 
tion and not by undue reliance upon the perfect truth of an abstract 
model. The main conclusion to be drawn from Table 4 and Figs. 1 
and 2 is that, over quite a wide range of values of 7, neither the potency 
estimate nor its fiducial limits (as calculated by the standard procedure) 
are affected by a change of metameter to an extent that would seriously 
affect decisions which had to be based upon the assay results. 


CONCLUSIONS 


It is not suggested that calculations of this kind should be under- 
taken as part of the statistical analysis of every biological assay. The 
investigation here reported was undertaken as a corollary to the work 
of Jerne and Wood (6), and as a warning to users of bio-assay against 
uncritical acceptance of a standard pattern, of computation without 
thought of the assumptions involved. The theoretical implications of 
a particular choice of metameter must not be forgotten, however little 
alternative choices would alter the inferences made from the data. 

The metameters tried for the prolactin and testosterone assays 
belong to a very restricted class, yet extensive calculations were needed 
for the construction of Tables 3 and 4. A larger set of possibilities 
would be comprised within the formulation 
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y* = (y — 
where both 7 and y) may be chosen so that the transformed data shall 
satisfy the basic assumptions of the assay and its analysis (7); another 
alternative would be a metameter that recognized the existence of both 
an upper and lower limit to possible values of y. 
Tables 3 and 4 suggest that, important as are (a), (b), (c) to the 
logic of the statistical analysis, the practical conclusions from an assay 


will seldom be seriously affected even by violent changes in ein response 
metameter: the transformations 


y*=y 
and 


are sharply contrasted, yet in two examples they produce substantially 
the same conclusions on potency. 

As might be expected, the z-test for heteroscedasticity is very 
sensitive to changes in the metameter, which may support Fieller’s 
view that a transformation for equalizing variances is of prime im- 
portance. On the other hand, application of the z-criterion to the testo- 
sterone assay would lead to rejection of the analyses based on certain 
metameters when in fact the results of such analyses in respect of the 
potency estimate are quite satisfactory. The other criteria proved sur- 
prisingly insensitive to changes in 7 for metameters of the type under 
discussion. The response relationship for the second assay would indeed 
have been regarded as significantly non-linear except for a restricted 
range of 7, a range that does not agree very well with the indications of 
statistical validity from other sources, but even the most extreme trans- 
formations failed to disturb the parallelism criterion. Other types of 
metameter might disturb the other validity criteria more severely, but 
evidently tests of parallelism and linearity, in assays of ordinary size, 
will often fail to disclose invalidity because of their low sensitivity. 

The present analyses may be regarded as empirical evidence in 
support of the contention of Jerne and Wood (6) that metameters 
should be chosen on the basis of past experience. The confusion that 
follows from any attempt to determine the ideal metameters for a single, 
probably rather small, series of experimental measurements is evident 
from examination of Table 3 and 4, Figs. 1 and 2. Before an assay is 
performed, there should be strong reasons for believing in its funda- 
mental validity and knowledge of suitable dose and response meta- 
meters to use in the analysis. The validity tests should then be re- 
garded as a confirmation that no abnormal behaviour of the subjects 
or other disturbance from unknown causes upset either the fundamental 
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or the statistical validity of the assay, and not as a demonstration of 
the validity of a particular set of assumptions peculiar to one assay. 
Two different transformations will never lead to exactly the same 
numerical values for the estimate of potency and the fiducial limits, 
and, in the absence of information on which of the two is correct, the 
problem of deciding which is the best set to choose as summarizing the 
data, on the internal evidence of one assay, seems insoluble: if the 
standpoint just recommended be adopted, there are good grounds for 
believing that, in spite of the fact that two statisticians faced with the 
same data would not necessarily use the same metameters, the problem 
is of importance only to the philosopher, and that the decisions of the 
experimenter will not be affected by it to any marked extent. 


SUMMARY 


The effect of choosing various alternative functions of an observed 
response as a metameter in the analysis of biological assays is discussed. 
The argument is illustrated by multiple analyses of two assays, and 
conclusions are drawn relating to the choice of a metameter and the 
interpretation of validity tests in general. 


I am indebted to Dr. N. K. Jerne for the suggestion that multiple statistical 
analyses of the results of an assay, using a series of different metameters, might be 
instructive. I wish to record my gratitude to him and to Dr. E. C. Wood for valu- 
able exchanges of ideas over a long period and for helpful criticism of a draft of this 
paper. 
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THE VALIDITY AND MEANING OF THE RESULTS 
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INTRODUCTION 


I MAY 1948, one of us (N. K. J.) wrote to Mr. D. J. Finney making 
certain comments on his paper [13] on the adjustment of biological 
assay results for variation in concomitant observations. Mr. Finney, 
after replying to this letter, discussed the points raised with the other 
of us (E. C. W.), and a tripartite correspondence ensued in the course 
of which almost every link in the long chain of reasoning and inference 
from the performance of a biological assay to the statement of its 
results was critically examined. It seemed to us, whose own thinking 
on the subject had been much clarified, that the main conclusions at 
which we had arrived might usefully be brought together and placed 
on record for the benefit of those who might not have had occasion to 
give so much thought to the theoretical background of biological assays. 

This background is surprisingly complex and varied. Behind the 
experimental design of an assay, and still more behind the final state- 
ment about the result and its fiducial limits, there lies a whole host of 
assumptions and implications. Some are chemical or biochemical, about 
the nature of the Test Preparation (T.P.) whose potency is to be assayed 
and that of the Standard Preparation (S.P.) with which it is to be 
compared. Some are biological or pharmacological, about the response 
evoked in the experimental animals by the stimulus of the doses ad- 
ministered to them. Some are mathematical and in particular statis- 
tical, about the computational processes by means of which the numeri- 
cal results are evaluated. Some, even, are philosophical—if not meta- 
physical !—about the relations between the theoretical abstractions of 


*ddress altered since this paper was written to the County Laboratories, Redwell St., Norwich, 
Norfolk. 
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pure mathematics and the realities of bio-assay. Not all of these 
assumptions appear to have been explicitly stated before; only in some 
instances has their validity been discussed [4, 8, 16] and the conse- 
quences examined that would follow if in a particular assay one or more 
were invalid. 

The most fundamental assumptions of all are three which it is not 
proposed to do more than state— 


1. That the fundamental concepts of biology and chemistry as applied 
to the methodology of bio-assay are correct. 

2. That the fundamental concepts of the theory of statistics as applied 
to the design and computation of bio-assays are correct. 

3. That the application of the principles of statistics to data from 
biological experiments is legitimate and useful. 


Discussion of these three tenets is outside the present scope; we shall 
take them as our starting-point and shall address ourselves to those for 
whom they are almost axiomatic. 

Most of the symbolism, much of the nomenclature, and some of the 
phraseology of this paper is taken directly from the writings of Finney, 
particularly his address to the Royal Statistical Society in 1947 [12]. 
Moreover, his part in the correspondence referred to above has been 
decisive in preventing errors and fallacies from creeping in, and several 
of the points made below are his in their origin. While the presentation 
and style of this paper is our own, Finney’s ideas permeate it to such 
an extent that he should be considered its godfather if not actually one 
of the parents. Valuable suggestions were also made by Dr. G. Rasch, 
and we have benefited much from discussions with him. Others, of 
course, have written on various aspects of bio-assay, and where we 
have consciously drawn upon these publications due acknowledgement 
has been made. If we have ignorantly overlooked some other-anticipa- 
tion of our remarks we offer our apologies in advance. 

In what follows, it is supposed that a dose z (z, if of the S.P., z, if 
of the T.P.) is administered to the experimental animal or fest subject 
and that this stimulus evokes some quantitatively measurable and 
continuously variable response u (or u, or u,). Quantal or ‘all-or-none’ 
assays are not considered, though much of the discussion should be 
quite applicable, mutatis mutandis, to such assays. If a number of 
test subjects be given the same dose of the same Preparation under 
constant experimental conditions (such a group will be called a dose- 
group) the response will vary from subject to subject because of exper- 
imental and sampling errors (in bio-assays, the latter will usually be 
so much larger than the former as to constitute the major cause of the 
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variation), and the symbol U denotes the expected or ‘ideal’ response 
of the population of test subjects to the dose z. The set of responses 
from any one dose-group will be called an array. An assay (or bio- 
assay) is defined as an experiment designed for the purpose of deter- 
mining the ‘potency’ (exactly what this means is described below) of a 
Test Preparation relatively to some Standard Preparation from the body 
of experimental data collected, by utilising the fact that the two Prepa- 
rations have the same qualitative effect on the test subjects. 

We propose from now on to distinguish between the fundamental 
validity of the assay and the statistical validity of the computations 
which follow it. In Section A, below, the assumptions essential to 
fundamental validity are discussed; for example, the need for a proper 
design of the assay. If any one of these assumptions is untrue for a 
particular assay, the data obtained cannot lead to a correct answer, no 
matter what arithmetical processes are applied to them. If, however, 
one of the essential assumptions for statistical validity (see Section B) 
is untrue, then it is the method of computation that must be amended; 
the assay data are satisfactory but the means adopted of éxtracting 
from them the information sought are inappropriate. For the final 
statement of the relative potency and its fiducial limits to be ‘valid’, 
tout court, both fundamental and statistical validity are essential. 


SECTION A—THE FUNDAMENTAL VALIDITY OF THE ASSAY 
PART 1. THE ESSENTIAL ASSUMPTIONS 

The assumptions essential to fundamental validity have been dealt 

with by several authors [2, 9, 10, 12, 16, 17, 19, 20, 22, 23]. Neverthe- 
less, certain points have emerged from the correspondence mentioned 
above which suggest that it would be worth while to re-state these 
assumptions and to add a few comments. 
Al. The differences between the several arrays in an assay are wholly 
caused either by differences in dosage or by random sampling; in other 
words, had the same dose been given to every test subject the arrays 
would have been random samples from the same population. 

This assumption may be modified if certain factors affecting the 
response are known and recorded in such a way that their influence can 
be taken into account (e.g., by an analysis of covariance), in which 
event such factors may not be randomised. 

A2. The expected response U is a function of the dose z, so that 


U = FQ), (a.1) 


where U is ‘a single-valued strictly monotonic function of z, at least 
over the range of doses to be used’ [12] [Finney’s Condition I.] 
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This condition is not essential, but is rather a restriction of con- 
venience. The possibility that a certain value of the response might 
be given by either of two different doses implies obvious risks of falla- 
cious conclusions. 

If the §.P. and the T.P. both satisfy this condition for a certain 
range of responses, so that 


U, = F(z.) (a.2) 
and 


U, = (a.3) 


then for any selected value of U within the range in question, doses 
z, and z, can be found such that the expected response to each is the 
same, U. At this level of response it can then be said that the T.P. 
has z,/z, times the potency of the S.P., and this is within its limitations 
an unambiguous definition of potency. But because no assumption 
has yet been made about the relation between functions F,( ) and 
F,(_), no statement can be made about the potency at any other level 
of response. 

A3. If the substance in the S.P. responsible for evoking the charac- 
teristic response from the test subjects is called the effective constituent, 
then the response evoked by the T.P. is due solely to the presence in it 
of the same effective constituent without modification by any other 
substance, ‘so that the less potent preparation behaves as though it 
were a dilution of the other in a completely inert diluent’ (Finney’s 
Condition IT). 

From this it follows necessarily that, apart from the effects of 
sampling and experimental errors, the estimate of potency obtained by 
a valid assay procedure is independent of the experimental conditions 
or environment, the measurement chosen as the response, and the 
species or variety of test subject used. It is worth making this state- 
ment explicitly, because this assumption is made whenever an estimate 
of relative potency obtained from rats or guinea-pigs is applied to the 
administration of the preparation assayed to cattle or human beings. 

The ratio p = z,/z, must, if this ‘hypothesis of similarity’ [23] be 
true, ‘be independent of U, for it represents the relative amounts of 
the effective constituent in equal doses’ [12]. Thus z, = p.2z, always, 
and equation (a.3) can be re-written 


U, =:F.(p2), (a.4) 


where p is the potency of the T.P. relatively to that of the S.P. at any 
level of response; this is the only definition of relative potency that 


| 
4 
et 
a 
og 


VALIDITY OF RESULTS OF BIOLOGICAL ASSAYS 277 


would normally be regarded by the bio-assayist as satisfactory. As 
Finney says [12], ‘If the data cannot be adequately described by the 
same form of F( ) for both preparations, the basic assumption that 
only the same effective constituents were concerned in both must be 
false... . The assay is therefore invalid’—and it might be added that 
the whole idea of assaying that particular T.P. against that particular 
S.P. becomes absurd. ‘ 

Similarly, equation (a.4) must be true for any kind of test subject 
and any measurement chosen as response; the form of the function 
F(z) may well change as the subject or the measurement is changed, 
but p must be invariant. 

The ‘effective constituent’ can be a mixture of two or more distinct 
chemical compounds, provided they are in fixed proportion in the S.P. 
and T.P. 

It may be worth while, as showing that this discussion is not so 
trite as to be superfluous, to quote some assay procedures in which 
assumption A.3, for one reason or another, does not hold in toto. When 
a preparation containing vitamin D, is assayed against a S.P. containing 
vitamin D, using rats, apparently valid results are obtained, but the 
‘potency’ of the T.P. is much less than if chicks are the test subjects. 
Here the less potent preparation certainly behaves like a dilution of the 
other in an inert diluent, but in fact it is not; the effective constituents 
are not the same. Again, remembering that the nutritional require- 
ments of various species differ, it might easily happen that the ‘inert’ 
diluent might be so for one kind of test subject but not for another. A 
variant might occur where the action of a food when given as a T.P. is 
indirect, stimulating the growth in the digestive tract of bacteria which 
in growing synthesise some nutrient utilised by the host. In other 
species not harbouring such bacteria, results might be different. An 
assay of the vitamin B, content of live yeast by feeding tests, using a 
reference sample of killed yeast of known vitamin content as S.P., 
might yield curious results. If given to a species whose gastric juice 
was so acid as to kill live yeast before it left the stomach, a valid estimate 
of the true vitamin B, content should be obtained; otherwise, the result 
might be quite erroneous, for yeast growing in the gastrointestinal tract 
absorbs vitamin B, from the foods undergoing digestion and thereby 
acts towards the host as a depletor of the vitamin. An assay in such 
circumstances might well show an apparently negative vitamin B, 
content! 

An instance of current interest in which assumption A.3 does not 
hold is the assay of diphtheria and tetanus toxoids in commercial 
products containing aluminium hydroxide, using as S.P. a reference 
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sample of highly purified toxoid. The adsorbent Al(OH), in the T.P. 
is not inert but interacts strongly—in an unknown manner— with the 
‘effective constituent’. Even so, within a certain range of dosage, the 
less potent preparation behaves like a mere dilution of the other, but 
when a wider range is examined it is found that the corresponding range 
of responses is much wider for the Al-adsorbed toxoids than for the 
plain toxoids, i.e., the dose-response curves of the two preparations have 
different upper asymptotes and cannot be described by the same form 
of F(z). The assay is thus invalid.* 

This is an illustration of the fact that in practice one is usually 
aiming at estimating the relative potency of two preparations supposed 
to contain the same ‘effective constituent’ but differing somewhat in 
‘diluent’. It may well happen—see above—that the diluent is inert 
for some species but not for dthers. In such cases the distinction be- 
tween the effective constituent and the diluent tends to disappear; if 
only one kind of test subject be used, one can never be sure whether 
the comparison of the §.P. with the T.P. is based on one effective con- 
stituent only or on two, in differing proportions. This can and does 
occur in assays of, e.g., hormones and sera. 

By comparing the T.P. with the S.P. using several different kinds 
of test subjects, concordant results may inspire confidence in the 
essential similarity of the two preparations; the reverse is equally true. 
This clearly implies that a substantial biological research must precede 
any attempt to set up a bio-assay technique for routine use. 

It should never be forgotten that bio-assay is not itself a basic 
science but an applied science, depending almost entirely upon bio- 
logical research. When a bio-assay is conducted involving reactions 
that have been inadequately studied, the risk involved in making the 
assumptions discussed in this section is very great, and one’s confidence 
in the results obtained should be correspondingly small. 


PART 2. RULES OF CONDUCT BASED ON THESE ASSUMPTIONS 


Al. In every assay there must always be many factors other than 
dose affecting the response. Some of these will be known and others 
unknown. Taking first the known factors, there are three ways of 
dealing with them. 

First, there are some which it is both possible and convenient to 
hold constant, or substantially constant, for all the test subjects through- 


*Thompeon [21] has shown that useful conclusions can be drawn from certain types of assay in 
which a substance other than the ‘effective constituent’ exerts an effect on the resp and our p 
tion A3 is thus quite untrue. We are not concerned with such assays, which are outside the scope 
of this paper. 
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out the assay period. Examples are afforded by the size of cage, the 
atmospheric temperature, the amount of handling by the attendants. 

Secondly, there are others which are known to be important but 
cannot be held constant. Each such factor should be recorded for 
every test subject so that the information can be properly utilised in 
the final calculations, either by an analysis of covariance or by segre- 
gating that factor and its interactions in the analysis of variance. The 
experimental design, and the allotment of individual test subjects to 
the various dose-groups, must be such as will permit one of these two 
alternatives to be employed. Such factors as sex, litter, and body- 
weight might well come into this category in particular assays. 

Thirdly, other factors known or assumed to be unimportant are 
best dealt with by randomising them, so that their effect on the re- 
sponse, though unknown, will not introduce bias into the result of the 
assay. Minor differences of age between test subjects, the order of 
dosing them, and so on, are factors which may properly be randomised. 

The point ought to be made, however, that randomisation is not 
the best way of dealing with a factor that ought to be and could be 
included in the previous category of recorded factors. Randomization 
leaves the worker ignorant of the effect of the factor randomised upon 
the response, and this knowledge is valuable both in the assay itself 
and in planning future assays. Moreover, if the influence of the factor 
is material, randomisation implies an additional assumption—that the 
contribution of the factor to the variance of the transformed response 
is truly additive—and if this assumption is incorrect the reliability of 
the assay is decreased. An illustration is the use in an assay of animals 
some of whom are grouped closely around one initial body-wt. (say 
600g.) and the rest around another (say 400g.). If body-weight affects 
the response, random distribution might result in a series of two-topped 
array distributions and ruin the assumption of Normality (B3 in 
Section 2). 

Care must also be taken to ensure that where randomisation is at- 
tempted it is actually achieved. Departures from the ‘strait and 
narrow path’ of pure randomisation do manage from time to time to 
insinuate themselves most subtly. For instance, Emmens has pointed 
out [8] that to put one’s hand into a cage of animals and take the first 
one caught for the first dose is not random selection unless the doses 
themselves are randomised, for presumably the animals that are caught 
last are the most alert and vigorous in eluding the pursuing hand and 
may differ in responsiveness to dosing from their more easily caught, 
perhaps because more weakly, associates. With orderly dosing tech- 
niques, all the most easily caught animals will be in the first dose- 
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group and all the most elusive in the last. Similarly it is usual in 
microbiological assays to dose successively all tubes in the first dose- 
group and then pass to the next; when this deliberate non-randomisation 
does not cause trouble it is only because the assay tubes can be made far 
more nearly replicas of each other than can ever be attained with the 
macro-biologist’s test material. 

Occasionally, evidence is obtained from the data of an assay that 
assumption Al is not valid for that assay; this possibility, and the 
consequences that ensue, are discussed in Section B, Part 2. 


A2. The assumption that some function connects the response with 
the dose, and that for each given value of the expected response within 
the range used there is one value and one only of the dose, can justifiably 
be taken for granted in the absence of evidence to the contrary. One 
can imagine circumstances in which the responses to two different doses 
of the same preparation were identical, but the point in practice would 
not cause any trouble. The question of the nature of the function is 
quite another matter; it is dealt with below. 


A3. The Hypothesis of Similarity [23] between the S.P. and T.P. can 
be tested statistically by examining the identity in form of the func- 
tions connecting response with the dose of the two preparations, pro- 
vided that the experimental design permits. The methods are well- 
known and will not be discussed in full here; in assays in which some 
function of the response is linearly related to the log. of the dose, the 
parallelism of the two regression lines is examined, while in assays in 
which the linear relationship is to the dose itself, the intersection of the 
two lines at the zero-dose level is the criterion. The greater the dose- 
range covered by the assay, the more sensitive is the test; but since 
too great a range of doses would involve risk of exceeding the limits 
over which the linear relation holds, a compromise is necessary. In 
theory, it does not matter whether the relation between the chosen 
function of the response and of the dose is linear or not; but the test 
for identity of form would be cumbersome with other than linear curves 
and as the caiculation of the result and its fiducial limits would also be 
complicated, linearity becomes itself an assumption (see below). 
Whatever statistical criterion may be applied to test the hypothesis 
that the two response curves are identical in form, it can only be ‘dis- 
proved’ or ‘not disproved’—it can never be ‘proved’. Moreover, the 
borderline between the two possible answers will depend on the degree 
of probability taken as ‘significant’. The analyst must use his own 
judgment in this respect; he would be justified in accepting as valid 
an assay of a T.P. whose composition was known a priori to be very 
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similar to that of the S.P., particularly if many T.P.’s of the same kind 
had been validly assayed before by the same technique, where he 
should reject as invalid, or at least reserve judgment on, an assay giving 
exactly the same numerical value of ‘validity criterion’ but in which 
the T.P. was of an unfamiliar nature and the assay was novel in type. 
In other words, the statistical calculations present the evidence ob- 
jectively and efficiently, but the decision is taken by the analyst in the 
light of his experience and judgment. 

The statistical tests for validity may give misleading results unless 
the mean responses to the doses of T.P. and those to the doses of S.P. 
cover about the same range. If the S.P. responses all fall within the 
limits of linearity but the T.P. responses do not, non-linearity of the 
T.P. curve may cause the rejection of a good assay for apparent non- 
parallelism or non-linearity; worse still, a fundamentally invalid assay 
might fail to be rejected because of accidental agreement in slope of 
the two response curves over parts of their respective ranges that were 
not truly comparable. Occurrence of a significant difference between 
the mean responses to 8.P. and T.P., though not an indication of in- 
validity in itself, is a warning that the other criteria of invalidity must 
be scrutinised with especial care. 

In some instances, as when the T.P. and S.P. are known to be dilu- 
tions of the same pure compound in the same diluent, the truth of 
assumption A3 is self-evident. On the other hand, there are assays 
which are known not to be universally valid, and for which a statement 
of potency ought to be accompanied by the name of the species for 
which it is applicable. But in the majority of assays for which A3 is 
accepted, this is done simply because there is no evidence to the con- 
trary. Therefore, whenever it is possible to carry out assays using 
different test subjects, responses ci techniques, or to perform parallel 
analyses by chemical or physical methods, the additional evidence thus 
provided is most valuable. 


SECTION B—THE STATISTICAL VALIDITY AND EFFICIENCY OF THE COMPUTATIONS 
PART 1—THE ESSENTIAL ASSUMPTIONS 


Section A deals with the matters on which the analyst must satisfy 
himself before he is prepared to make any statement whatever about 
the potency of the T.P. Equally complex considerations are involved 
when deciding the precise numerical values to be quoted as the ‘best’ 
estimate of potency and its fiducial limits. These will depend on several 
assumptions about the nature of the dose-response relationship and the 
computations employed. The assumptions do not, of course, affect the 
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fundamental validity of the assay; but some assumptions must be made 
before any computations can be performed at all, and those on which 
the calculations at present in use are most often based will now be 
enumerated. 

Bl. The doses both of the S.P. and T.P. have been measured suffi- 
ciently precisely for errors of measurement in the dose z to be negligible 
in comparison with the sampling and experimental errors in the re- 
sponse u. 

This assumption is inherent in the usual formulae by which the re- 

gression equations are calculated. The computations could no doubt 
be modified for assays in which this assumption does not hold. 
B2. There exists at least one function y of the response u, and one 
function x of the dose z, such that (a) to each value of u and z within the 
range of the assay there corresponds one and one only real value of the 
response metameter y and the dose metameter x respectively ; (b) the trans- 
formation of u to y is independent of the transformation of z to 2; 
(c) the values of y and of x so obtained satisfy assumptions B3 to B6 
inclusive. 

Possible forms for the functions include, of course, y = u, x = 2. 
B3. The functions so defined are known and are of such a kind that 
y and x can be computed. 

In theory, the functions instead of being known completely (e.g., 
xz = log. z, y = u’) could be known in terms of a parameter or para- 
meters to be estimated from the assay data (x = log. (n + z), y = u”). 
The computations would thereby be extended considerably. 

B4. The relation of Y, the expected or ‘ideal’ value of y, to x is exactly 
expressed by the linear equation 


Y=a+-+ bz, (b.1) 


(in which a and b are constants to be evaluated from the assay) over the 
range of the observations used in the calculations. (Assumption of 
Linearity.) 

The relation between Y and zx could be non-linear but precisely 
known. Once again, the calculations would be complicated consider- 
ably; Fieller [9] has shown how curvature can be allowed for. 

BS. For any one value of x within the range of doses used the fre- 
quency distribution of y is exactly normal. (Assumption of Normality.) 

Alternatively, the distribution might be not Normal but of other 
known mathematical specification—a possibility not likely to be of 
much practical importance when the response metameter is, a con- 
tinuous variate. 
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Bo. The variance of x is independent of Y. (Assumption of Homosce- 
dasticity.) 

This assumption, though usual, is not essential; the variance could _ 
be assumed to depend in a known manner on Y. The consequences of 
doing so are discussed below. 

B7. Provided assumptions B1 to B6 are true, the mathematical process 
actually applied to the data leads rigorously to exact and unique values 
for the ‘best estimate of potency’ and its ‘fiducial limits’.* 

This implies that the computations employed are those appropriate 
to the data of the particular assay being evaluated; that all available 
information (including, e.g., concomitant measurements) has been 
efficiently utilised; that arithmetical blunders have not been committed! 


PART 2. RULES OF CONDUCT BASED ON THESE ASSUMPTIONS 


Bl. Very little comment is necessary on the obvious requirement that 
the method of dosing used must ensure that each and every test subject 
receives its intended dose with high precision and accuracy relatively 
to the measurement of the response. If vitamins, for example, are 
being given per os, it must be certain that the whole dose is delivered 
into the animal’s mouth and swallowed without loss. There are a few 
assays in which the error of measuring the dose is unavoidably large; 
for example, when the dose is measured as numbers of bacteria injected, 
the magnitude of each dose is dependent upon a plate count of bacteria 
and is thus subject to considerable error. The magnitude of the error 
in x should then be estimated separately if possible and the computa- 
tions modified accordingly. 


B2, B3. The existence of, and the practicability of formulating, the 
equations for transforming the dose and response as actually measured 
into metameters amenable to standard computational processes may 
be taken for granted. It is true in theory that there may be no pair 
of transformations for which assumptions B3 to B6 are true simul- 
taneously. In practice, there will often be many—perhaps an infinite 
number—of transformations for which the critical assumptions are 
satisfied sufficiently closely (see below). For example, if y = u, x = 
log. z, are found satisfactory, then there is no doubt that small¥varia- 
tions such as y = u’’, x = log (z + 4), where 4 is a relatively small 
quantity, would give equal satisfaction, to name no other possibilities. 
There is no logical reason for preferring one of these satisfactory pairs 
of transformations to any other when dealing with a single assay con- 
sidered in isolation; a transformation should be selected for which 


*Por the meaning of these terms, see Section C. 
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(first) there is no significant evidence that the assumptions B3 to B6 
do not hold good, and (second) the subsequent calculations are as easy 
as possible. Too much must not be made of this second point; com- 
pared with the time taken for a full computation of fiducial limits, it 
is of no consequence if a few more minutes are spent transforming the 
measured response into, say, ~/(u* + 5) instead of wu itself, provided 
that some advantage is to be gained thereby. 

If, of course, there are plausible reasons based on chemical or bio- 
logical theory for assuming a particular relationship between the dose 
and the response, this may provide a basis for the selected transforma- 
tion; but it must be emphasised that the transformations used in 
practice at the present stage of biological assay are almost always purely 
empirical, and the reasons for their selection pragmatic, at least in the 
first instance. Conclusions drawn from such single assays must there- 
fore always be regarded as somewhat tentative. 

The position is materially altered when the assay to be considered 
has a background of previous experience—when it is one of a series 
based on an identical methodology extending into the past and the 
future, or when there has been a prior research into the nature of the 
dose-response relationship under the conditions used in the assay. 
Some of the assumptions can then be tested far more rigorously than 
from the data of an isolated assay. The assumption of Linearity (B4) 
may not be significantly departed from in any one assay of a series; 
but if twenty consecutive assays deviate in the same direction, the 
pooled evidence effectively disproves the assumption for the series as 
a whole. Again, the variance of y may show a tendency to increase as 
x increases which is quite without significance in any one assay but 
because of its occurrence in every assay in the series demonstrates that 
assumption B6 is untrue. When, therefore, a number of similar assays 
has been performed, a transformation ought to be selected for which 
it has been shown by appropriate tests* that departures from each and 
every one of the assumptions made will be as often in one direction as 
in another, so that the results quoted for the assays will be as often too 
high as too low, and over the series of assays will be without any syste- 
matic bias. 

B4. If there is no significant departure from the linear relationship 
for either S.P. or T.P., and provided that such deviations as do occur 
in a series of assays show no consistent trend in direction or type, the 
assumption of Linearity may be made. On the other hand, it is cer- 


*Or for which the bio-assayist feels convinced, on the basis of a long experience, without making 
tests. ... It is to be feared, however, that many workers (including the present writers!) tend to omit 
the tests on the slightest excuse—usually pleading lack of time. 
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tainly not justifiable to take it for granted without evidence, as would 
be the case if, for example, a 4-point assay design (two doses of S.P. 
and two of T.P.) were used without other evidence—this is discussed 
more fully below. Even when it has been established by exploration 
of the full dose-response curve that there is a range of doses over which 
the relation between the dose metameter and the response metameter 
is effectively linear, a watch must be kept on subsequent assays to 
ensure that over a period of time the inevitable changes in the sensi- 
tivity of the animal colony to the preparations do not lift or lower the 
mean responses as a whole on to a non-linear part of the dose-response 
curve. This means that there should be at least three dose-groups each 
of the S.P. and T.P. In assays in which one animal of each litter is 
placed in each dose-group, in order to obtain the well-known advantages 
of being able to segregate inter-litter variance, the litters may not all 
be big enough to allow of this being done for more than two doses of 
each preparation. Even so, some litters will contain more than this 
minimum number of animals, and these can be used to form small dose- 
groups additional to the assay proper; their responses may not be used 
in the final calculations, but the evidence they provide, and the sense 
of security thereby engendered, make the extra work well worth while. 
Alternatively, incomplete block designs such as have been developed for 
agricultural experiments can be employed. If a long series of routine 
assays of the same kind is continually being extended, it is very de- 
sirable to carry out at intervals a check on the full dose-response curve, 
using as many doses of the S.P., and over as wide a range, as possible, 
falling above and below the expected linear range as well as within it. 
The 4-point design should be used only when there is either previous 
well-substantiated experimental evidence or a priori knowledge that 
the S.P. and T.P. are of the same nature, i.e., that the Hypothesis of 
Similarity (assumption A3) is justified; for in this design departure 
from this vital hypothesis cannot be distinguished from non-linearity, 
as has previously been mentioned. On the other hand, if all the condi- 
tions for validity were known to be fulfilled, this design would certainly 
make the most efficient use of the animals available, or putting this 
another way, it would enable a specified precision to be attained with 
a minimum number of animals. There are thus sometimes circum- 
stances in which it is the design of choice; but its drawbacks, and the 
risks involved in using it without adequate safe-guards [11], must be 
emphasised. ; 
With other designs, the Between Treatments component of variance 
has 4 or more degrees of freedom, so that it is possible after taking out 
the Linear Regression fraction and the ‘S.P. v. T.P.’ fraction to examine 
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separately the contributions made by departures from assumptions A3 
and B4. (In assays in which z is the logarithm of the dose, these are 
usually referred to by the convenient terms of Unparallelism and 
Curvature respectively [8], and we shall use these terms here.) Sig- 
nificant curvature calls for further examination of the data to decide 
its nature, for the action to be taken depends largely on the answer to 
this question. Three principal cases can be distinguished. 


(a) Both the S.P. and T.P. lines are curved, the curvature being in 
the same direction and of approximately the same magnitude in both— 
in assays in which ‘Mean Curvature’ and ‘Opposed Curvature’ can be 
examined statistically using orthogonal comparisons, this means that 
the first is significant and the second not. This suggests that the trans- 
formations used to obtain y and zx are inappropriate; if some other can 
be found which will linearise the relationship between the dose and 
response metameters for the S.P. it will almost certainly do so for the 
T.P. as well. The assay should be quite valid, however; there is no 
suggestion that assumption A3 does not hold. 


(b) Both lines are curved, but in opposite directions, or one is much 
more curved than the other, i.e., ‘Opposed Curvature’ is significant. 
This means that the assay as it stands is certainly invalid; either the 
Hypothesis of Similarity is not true, or it appears to be untrue because 
the range of doses for which the linear relationship holds has been ex- 
ceeded. This can be checked by inspection and trial; it may be that 
the assay can be linearised by omitting the highest (or the lowest) dose- 
group on one or both preparations. If what is left is a 4-point assay, 
of course, it may be of doubtful value unless there is other evidence 
available as discussed above. 


(c) Neither line is curved but the mean responses deviate from their 
respective regression lines in a random although significant manner, 
i.e., the ‘Between Dose-groups’ deviations are significantly greater than 
the ‘Within Dose-groups’ variance and yet are not accounted for by 
any kind of uniform curvature. This implies that proper randomisa- 
tion has not been attained either in assigning subjects to doses or in 
the errors attaching to dosing, i.e., that assumption Al. does not hold. 
The moral is to alter the technique employed in subsequent assays; 
but in calculating the results of the present one, allowance may be | 
made for the high residual variance between dose-groups (after removing 
the variance attributed to Preparations and Regression) by taking this, 
rather than the within-groups variance, as the square of the quantity 
to be used as the standard error in computing the fiducial limits of the 
assay. This is to be regarded as a device for making the best of a bad 
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job, not for permitting deliberate and persistant flouting of the need for 
randomisation. 


Small departures from Linearity have very little effect on the esti- 
mate of potency, and if this with its fiducial limits is computed ignoring 
the non-linearity component of variance the result will be very close 
to the truth. But the experimenter must be sure that there is no system- 
atic bias in his results as a whole due to a persistent trend in the 
curvature, and above all he must be satisfied that what he takes as 
non-linearity is not something far more serious—namely, invalidity of 
the Hypothesis of Similarity. 


B5. The assumption of Normality is almost impossible to test from 
the data of any ordinary assay, because the replications available are 
far too few, and unless there is something very odd indeed about the 
responses as measured (and this would lead the experimenter to discard 
the assay in any event!) the assumption is taken for granted. The 
consequences of doing so when in fact there is a significantly non- 
Normal distribution have been discussed both m the general case 
(7, 17] and for the special circumstances of biological assay [10, 12]. It 
is reassuring to find that the result of the assay should be almost un- — 
affected by quite large departures from Normality, because the mean 
slope and the mean difference in response between T.P. and S.P., the 
quantities from which the estimate of potency is derived, are means 
based on all the observations, and it is known that such quantities are 
much more nearly Normally distributed than are individual observa- 
tions. But this is not true of the fiducial limits of the estimate of 
potency; they are functions not only of the two quantities just men- 
tioned but also of others which are not means. One of them is Student’s 
t, and Geary [18] has shown that the use of ¢ in tests of significance may 
lead to quite erroneous conclusions if the relevant population is skew. 
Further investigation appears to be called for, first by the experimenter 
to examine the frequency distribution of such responses as are used in 
practical assay techniques, using as large a number of test subjects as 
possible in relatively few dose-groups, and secondly by the mathe- 
matician to discover how far non-Normality if it occurs in biological 
assays affects the fiducial limits calculated by the standard formulae 
assuming Normality. 


B6. The assumption of Homoscedasticity is usually made in the 
absence of any significant evidence to the contrary. It must be ad- 
mitted that the point is rarely examined by any more stringent tests 
than ‘eyesight’ inspection. Fieller (10] has given it as his opinion that 
this is the primary requirement to be satisfied by the response meta- 
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meter and that other requirements are secondary, but Finney [12] is 
not in complete agreement with him on this point. If the assay design 
is symmetrical and well-balanced as regards the doses employed of 
S.P. and T.P., the result obtained by weighting the mean response in 
each dose-group according to its variance will differ very little from 
that obtained by assuming Homoscedasticity. If the variance shows a 
consistent trend upwards or downwards from the lower to the upper 
limit of response, particularly in a series of consecutive assays, then 
some other transformation which will equalise the variance over the 
range of the assays must be sought, or alternatively the calculations 
must be modified so as to take into account the dependence of the 
variance on Y. The form of the relationship can be approximately 
inferred from the internal evidence of the assay (better still, of a series 
of assays), and, even if incorrect, the correction thus introduced will 
be better than no correction at all. But the bio-assayist who performs 
his own computations, and is usua!ly averse from any extension of the 
arithmetical labour, will probably prefer to use some other transforma- 
tion of his data in the first place so as to be able to make use of well- 
tried orthodox formulae. 

B7. It is clearly necessary to ensure that the mathematical processes 
employed are appropriate to the assay technique, the transformations 
of dose and response used, and the assumptions made. Occasionally 
one encounters an attempt to work out the result of a ‘slope-ratio’ assay 
using the formulae applicable to ‘logdose’ assays! But apart from 
such blunders, it is not always easy to be sure that every available piece 
of information in the data has been utilised as fully as possible, and 
that the methods of statistical estimation by which the formulae are 
derived are of maximum efficiency in the statistical sense—that is to 
say, in the sense that a sample mean is a more efficient estimate of the 
population mean than the average of the two extreme values would 
be, or the estimate of the standard error derived from the squares of 
the deviations from the mean is more efficient than one derived from 
the range. It is for the statisticians to assure themselves that the com- 
putational methods they use and teach to bio-assayists are beyond re- 
proach in these respects; there are problems here which are not yet 
solved. One, which strictly speaking is outside the scope of this paper, 
is the correct method of making allowance in quantal assays for the 
increase in precision theoretically obtained by litter-mate control, just 
as it is taken inte account as a matter of course nowadays in quantita- 
tive assays. Yet only when the statistical techniques are as powerful 
as possible in the prevailing state of knowledge should the ‘fiducial 
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limits’ really be dignified with that title; figures computed by inefficient 
methods are approximations and unworthy of a name with such precise 
implications. 

The utilisation of all available information implies that the data of 
one dose-group ought not to be omitted from the calculations for the 
sole reason that it makes them easier. Again, if the body-weights of 
the animals have been recorded at the beginning of, say, a pharmaco- 
logical assay, the theoretically correct method of utilising most effi- 
ciently this information is via a covariance analysis [3, 5, 6, 13, 15]. 
The alternative practice of giving doses of drugs and the like in quanti- 
ties proportional to body-weight is time-honoured but—as Bliss and 
Marks pointed out in 1939 [5]—illogical and inferior. There are three 
possibilities: the ‘ideal’ response, other things being equal, may be (a) 
independent of initial body-weight (b) linearly related to body-weight 
(c) related in some other way to body-weight. If (a), the administration 
of doses proportioned to body-weight is quite unsound. If (b), it 
will lead to the same results as if an analysis of covariance were per- 
formed. If (c), an analysis of covariance will give more accurate results 
more efficiently. Now as Mr. A. L. Bacharach has pointed out [1] the 
time spent in calculating and measuring out each separate dose for 
individual animals according to their body-weights is usually more than 
would be required for the extra computations in an analysis of co- 
variance if the dose had been constant per dose-group. Moreover, in 
nutritional, serological, and bacteriological work, doses proportional to 
body-weight are almost never given.. Probably the fact that the worker 
knows how to calculate proportional doses, but does not know how to 
perform an analysis of covariance, has something to do with the preju- 
dice in favour of the former! In any event, one would expect a natural 
persistence of the old method (the only possibility in the days before 
covariance analysis had been invented) until knowledge of the newer 
computational techniques has become widespread and the reluctance 
of many bio-assayists to ‘do more sums’ than they can possibly help 
has been overcome. But it is worth pointing out that if an analysis 
of covariance were performed in appropriate circumstances, not only 
would the result of the assay be evaluated more efficiently, but some 
information could be obtained about the nature of the relation between 
responses and body-weight instead of this being assumed linear without 
evidence as when the ‘proportioned-dose’ technique is used. The same 
technique could and should be applied to any other factors besides 
body-weight known to be important; randomisation of such factors is 
a poor substitute. 
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SECTION C 


THE MEANING OF THE TERMS ‘ESTIMATE OF POTENCY’ 
AND ‘FIDUCIAL LIMITS’ 


When all the assumptions discussed in Sections A and B have been 
validly made and the appropriate computations performed, there 
emerges the grand dénouement, the fulfilment of all the experimental 
and arithmetical labour—the result of the assay. This will normally 
consist of (a) the estimate of potency, for which we shall use the symbol 
R; (b) the fiducial limits of R, which may be symbolised as p, and py 
for the lower and upper limits respectively. While every bio-assayist 
has a more or less clear idea of what is meant by these terms, he might 
well have difficulty in framing their definitions in reasonably exact 
words, and there are many aspects of them which deserve more con- 
sideration than they are normally given, particularly the term ‘fiducial 
limits’. Reference should be made here to an important paper by Yates 
[24] on this subject, from which some of the points made below are 
taken. It is couched, however, in phraseology rather too mathematical 
for the reader who is not himself a mathematician. 


PART 1. THE ESTIMATE OF POTENCY 


In one sense this is very easy to define, for the principal object of 
the assay is obviously to estimate the potency p as defined in Section A, 
and R is thus the estimate from the assay data of the ‘ideal’ or ‘true’ 
potency p. But by using different methods of calculation, different 
estimates of p could be obtained, all more or less plausible. The statis- 
tician, however, will say that there will be one estimate which is the 
‘best’ in the sense that it is fully efficient; that other inefficient estimates 
are inferior; and that he can state the general rules for calculating un- 
ambiguously and with certainty this ‘best’ estimate R on the basis of 
the assumptions made.* 

What he means by the ‘best’ estimate can be defined without much 
difficulty or subtlety. Let it be supposed that a long series of assays 
are carried out using the same S.P. and T.P., the same test subjects 
(or as nearly comparable test subjects as can be obtained), the same 
environment and the same methodology; that all the assumptions of 
the previous sections are exactly true for all these assays; and that to 
the data of each assay are applied alternative methods of calculating 


*It may be in certain assays that even allowing the truth of all the assumptions listed in Sections 
A and B, more than one fully efficient estimate of potency can be obtained by following different rules 
of estimation. These will then be equally ‘good’; the remainder of the present discussion is still applic- 
able to these estimates. 
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an estimate of p. Corresponding to each method of calculation there 
will be obtained a long series of estimates. Then that computational 
method is ‘best’ for which (a) any one of the estimates obtained by 
using it would approach indefinitely closely to p if the assay size were 
increased to include more and more test subjects without limit; (b) the 
series of estimates has a standard deviation smaller than that of any 
similar series produced from the same experimental data by some other 
computational process. The first requirement ensures that the estimate 
is ‘consistent’ in the statistical sense; the second ensures that it is 
‘efficient’, i.e., that the result of an assay does not deviate from the 
truth by more than is inherent in the experimental data, as distinct 
from the method used of extracting therefrom the information sought. 

In practice, of course, the assumptions of Sections A and B will not 
be precisely true for any one assay, much less the entire series. Never- 
theless, provided that there is no systematic bias involved in any one 
assumption, so that the departures from the assumptions over the series 
are random, that method of calculating the estimate of potency which 
is ‘best’ in the ideal case will also be best in the practical case. 

The manner in which to decide the computational process which 
should be applied to any given set of data is outside the scope of this 
communication except in so far as it is discussed in Section B, Part 2 
and in Section D. In the main, this is a question of general statistical 
theory. 

PART 2. THE MEANING OF ‘FIDUCIAL LIMITS’ 


Most bio-assayists appreciate that while the estimate of potency is 
the best single figure to quote as the result of the assay, it is not possible 
to assign any definite value to the probability that it is correct. In 
order to state the ‘reliability’ of the assay result quantitatively, it is 
necessary to give the fiducial limits corresponding to some arbitrary 
level of probability. (This is usually 95%, and in what follows this 
will be taken as the probability level for discussion, though there may 
often be good reasons for using some other figure.) If pressed for a 
definition of what his fiducial limits represent, the bio-assayist will often 
say that they are the limits within which it is probable, with odds of 
19 to 1 on, that the true potency lies. If rather more knowledgeable 
than this, he will know that orthodox statistical philosophy denies the 
possibility of assigning a true frequency or probability distribution to 
an unknown population parameter.* He will then cast his definition 
in some such mould as this—‘The fiducial limits p, and py are quantities 
(the largest and smallest possible respectively) such that if the true 


*Except in certain rather artificial problems of no practical importance, 
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potency were below p, , then not more than 214% of all samples (i.e., 
assays) drawn from such a population would yield an estimate of 
potency as high as R, while if the true potency were above py not more 
than 214% of the samples would yield an estimate as low as R’. The 
difference between the previous inaccurate but simple definition and 
the more accurate but complicated definition should be noted; the latter 
avoids the pitfalls of inverse probability by a statement of the form 
‘if . . . (hypothesis about the population) then . . . (deducible conse- 
quence about the sample)’ i.e., the inference is from the population to 
the sample. But even so, there are complexities behind this seemingly 
satisfactory definition which deserve examination. The logical chain 
of reasoning may be set out in three steps, as in the ensuing paragraphs, 
where the argument is simplified without any loss of generality by 
supposing that the samples drawn consist of n animals all in one dose- 
group, and that from the mean m and standard error s of the response,* 
statements are to be made about the ‘true’ response » and standard 
deviation o of the population. Extension to assays and statements 
about ‘true’ potency follows without any alteration in the reasoning. 

(a) If the mean and standard deviation of the (supposedly Normal) 
population were both known, the probability P could be stated that 
the mean m of a sample of size n would lie above a given value my or 
below a given value m,; . Conversely, if it were P that was given, the 
corresponding values of my and m, could be stated. These statements 
would be based on the distribution of the quantity [(m — »)/o] Wn, 
which is known precisely for such a population, for it is the ratio of the 
difference between the population and sample means to its standard 
error. 

If a long series of samples is taken from the same population, the 
values of m and of the sample standard error s will in general vary 
from sample to sample. There will also in general be a difference be- 
tween the predicted value of P. as above and the actual proportion of 
samples having means outside the limits m, to my . This difference 
will be due entirely to the sampling error of m; it will have nothing to 
do with the sampling error of s, which does not enter into the formula 
by which P is calculated; and it will clearly tend to zero as the number 
of samples increases indefinitely, i.e., in an infinite series of samples 
the proportion lying outside the stated limits will be exactly P. These 
three statements are important in the light of what follows. 

(b) When yu is known but the standard deviation is not known, 


= (x — m)? 
*The standard error s must be calculated by an efficient method, i.e. from 8? = ——-———- 


(n — 1) 
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statements of the same kind can still be made, based on the known 
distribution of Student’s ¢ = [(m — y)/s] ~/n. This does not involve 
any inferences about the probable value of o for a given s (s of course 
represents the standard error calculated from the sample) and the state- 
ments made may thus appear at first to have the same logical founda- 
tions as before. 

There is, however, a difference. A statement, based on the value 
of s for the first sample, such as ‘the probability is P that the mean of 
a sample of size n and standard error s will lie below m, or above my’ 
must contain the italicized words if only by implication, and will be 
incorrect if it does not.* But if a long series of samples is taken, s will 
not remain constant throughout the series. The difference between P 
and the actual proportion found in the long series will depend partly 
on the sampling error of m as before, but also on the departure of the 
value of s for the first sample from the mean value § for the whole series; 
it will not in general tend to zero as the number of samples increases. 
This point does not appear to be made in anid of the standard text-books; 
it 7s made by Yates [23]. 

(c) The further step from a known to an unknown yp does not in- 
troduce any additional logical difficulty. One can still make state- 
ments of the same kind as before provided they are prefaced by the 
introductory words ‘If the mean of the population from which this 
sample was drawn be assumed to be yu, then... ’. To reduce the 
number of variables, let P be fixed from now on at 0.05. Then against 
various hypothetical values of » the corresponding values of m, and 
my could be tabulated for a given sample of size n and standard error s. 
Once the table had been constructed, one could readily select from it 
two values of » such that if the lower u, were correct, 97.5% of samples 
of size n and standard error s (note these words) would have a mean of 
m or less, and if the upper uy were correct, 97.5% would have a mean 
of m or more. Then, using the conventional criterion of P < 0.05, 
hypothetical values of » outside the range uw, to wy are regarded as 
disproved, and the term ‘fiducial limits’ is attached to uw, and py . 

The point made in paragraph (b) above, that s will not in fact 
remain the same from sample to sample, still applies in exactly the 
same way. For example, if a long series of samples were drawn from 
a population of which the mean was yu, and the std. deviation unknown 
but estimated by s (the std. error of the first sample), the proportion 
of means above m would not in general be 2.5%, nor would it even tend 
to that figure as the size of the series increased. 


*Unless s for the first sample happens by chance to be exactly the same as the mean value g for 
the whole series of samples. This colossal ‘fluke’ may be ignored. 
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We are in practice concerned with the problem, not of making one 
isolated statement about the fiducial limits of a population mean based 
on one sample, but of making a long series of such statements each 
based on one sample from a different population each time (e.g., suc- 
cessive routine assays of different Test Preparations). Suppose that in 
spite of the logical flaw under discussion, we always calculate fiducial 
limits as above and report on each experiment in the words ‘“The mean 
result of this experiment is m and—for a probability of 0.95—the 
fiducial limits of the population mean are yw, and py’. Then these 
limits will sometimes be too wide (when s for the experiment concerned 
is above the mean value we should have obtained for s if we had per- 
formed a very large number of identical experiments) and sometimes 
too narrow (when the reverse is true). We shall err as often in the one 
direction as the other, so there will not be any systematic bias in our 
statements. It is not true, however, that 95% of them will be correct; 
in general, no single statement in this form will be exactly correct, 
except by chance, just as no single value of s obtained in an experiment 
is an exact estimate of the mean $ for the hypothetical series of experi- 
ments. 

There is another way, however, of putting the same point which 
leads to more practical conclusions. Instead of saying that the limits 
uy and py are sometimes further apart and sometimes closer together 
than the true limits corresponding to a probability of 0.95, one could 
say that the true probability corresponding to the limits quoted is 
sometimes more and sometimes less than 0.95. If a long series of similar 
assays were reported on in the same terms, the average probability 
would converge to 0.95 as the series became longer. If, therefore, the 
wording of the reports were to be amended to “the mean result of this 
experiment is m and the true value of the quantity é¢stimated by m lies 
between yp, and py” it would then be true to say that the percentage 
of correct reports would be close to 95%. 

It is curious and at first sight paradoxical that by thinking about 
the proportion of true assertions in a long series, we have arrived at a 
form of wording which appears to be as heretical an example of ‘in- 
verse probability’ as the first definition of fiducial limits, which was 
deservedly rejected. But the reader who finds himself confused at this 
point should consider further the difference between the statements 
‘On the evidence of this isolated assay, I assert that there is a prob- 
ability of 95% that the true mean lies between X and Y’ and ‘On the 
evidence of this-isolated assay, I assert that the true mean lies between 
X and Y, and in saying so I also assert that over a long series of similar 
assertions made on similar assays, I shall be correct 95% of the time.’ 
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Definitions of the terms ‘mean result’ and ‘fiducial limits’ may now 
be advanced which summarise these arguments and conclusions. It is 
hoped that they may be helpful both to the bio-assayist with sufficient 
mathematics to appreciate an accurate form of wording and to those 
who take decisions on his reports and are interested only in the practical 
application of them. 

The ‘mean result’ R is the best single estimate that can be formed, 
provided that certain basic assumptions are valid, of the true value of 
the quantity which it was the object of the assay to estimate. Because 
of the inherent and unavoidable errors of biological assays, the estimate 
R is subject to uncertainty to an extent which is measured by the 
‘fiducial limits’ p, and py . 

The meaning to be attached to these figures is that if the true value 
of the quantity estimated were as low as pz , it is very probable that 
further assays similar to the present one would give a mean result lower 
than R, and if the true value were as high as py , it is equally probable 
that further assays would give a result higher than R. The degree of 
probability referred to in these statements is not far removed from 
0.95; if a long series of similar assays were performed and the fiducial 
limits of each calculated in the same way, the average probability in- 
volved in statements of the same kind would be very close to 0.95, and 
the longer the series, the closer the approximation would be. 

For practical purposes, the quantities p, and py may be regarded 
as limits within which it is very probable that the true value lies. If in 
a long series of assays the assertion is made each time that the true value 
lies between the fiducial limits, then 95% of these statements will be 
correct. 


SECTION D 
THE RELATIONS BETWEEN BIO-ASSAY, STATISTICS, AND REALITY 


This discussion would not be complete without some further con- 
sideration of the criteria for selecting those functions of the doses and 
responses as actually measured which are to be used as ‘metameters’ 
in the calculations, and of the effect on the results if the assumptions 
enumerated in Section B are not rigidly true for the transformations 
chosen. It has been pointed out in Part 2 of that Section that the most 
to be expected of metameters in practice is that over a series of assays 
only random and non-significant departures from assumptions B2 to 
B6 are shown. Even then it is certain that there will be variants of 
these metameters which would have been found equally satisfactory 
and would have given somewhat different fiducial limits [14]. It is 
then a difficult problem to decide which particular transformations 
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should be used. It would clearly be very desirable, if it were possible, 
to ensure that the choice was not in any way connected with the pre- 
delections of the individual computer; for otherwise the calculations 
would lose that objectivity which is always claimed as one of the chief 
advantages to be gained from the use of statistics, and credence would 
be given to the gibe often heard from those suspicious of statistical 
‘manipulations’, that six different statisticians, set to work on the same 
data, will produce six different results. This is regarded as evidence 
that the statistician, like a dishonest accountant, ‘cooks the books’ so 
as to produce whatever answer is most expedient. In so far as there is 
any substance in such a charge, it is largely because the criteria to be 
applied in selecting the metameters do not lead to an unique choice. 
Wishful thinking conjures up the impractical idea of a set of working 
rules which would lay down an order of preference for transformations, 
so that, other things being equal, log. u would be tried before 1/u; 
1/u before 1/+/u; 1/+/u before 1/+/u + 5; and so on. But it is 
important to realise that the selection of the transformation to be used 
is, as was stated earlier, a purely empirical process; there is hardly ever 
any theoretical reason for preferring any one to any other. 

Indeed, it is possible to maintain that such questions as ‘is the dis- 
tribution of the response metameter precisely Normal?’ have no real 
meaning at all; remembering that the question relates to an imaginary 
population of responses that might have been obtained if the experiment 
had been replicated ad infinitum, and that the only evidence is a 
ridiculously small sample of perhaps 20 responses, one could retort that 
the answer to the question is not only unknown but for ever unknowable. 
From this it is a small step to the suggestion that the question should 
never have been posed and indicates a completely unrealistic approach 
to the problem. 

This is in fact the gravamen of a second criticism sometimes heard 
of statistical calculations—that before they can be made at all, so many 
simplifying assumptions are necessary that the operations which follow 
are conducted upon theoretical abstractions (e.g., a perfectly Normal 
variate exactly linearly related to another completely error-free variate, 
etc.) which bear no relation to the realities of which they are the idealised 
simulacra. 

The sufficient answer to such criticism is that if it is illegitimate to 
apply to the things of this world calculations based on the entities of a 
theoretical world with slightly different properties, then none of the 
computations of physicists, engineers, and astronomers—to name no 
others—have any meaning at all. The computers who predict the 
performance of a new locomotive before it has left the drawing-board 
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are making assertions about a figment of the imagination, a Ghost 
Train, to which the real engine when built will approach more or less 
nearly. In the early days of such computations, lack of knowledge 
resulted in poor approximation, because the simplifying assumptions 
were too simple. As knowledge progresses, it becomes possible to make 
assumptions which although more complicated are more nearly related 
to the truth. The answers thus obtained are better approximations; 
but the calculations are much more tedious and time-consuming. 
Ultimately there comes a point where further approximation even if 
possible is not made use of; the increased precision thereby obtained is 
unnecessary for practical purposes and the extra labour spent on the 
computations would be mere waste of time. Whatever Absolute Truth 
may be—and the answer is metaphysical—we can all recognize the 
very real existence of what may be called Engineer’s Truth; when the 
stage just described in the approximative process has been reached, the 
Truth has to the engineer been attained. 

The processes of statistics follow an analogous path. In the present 
discussions several examples have been given of the possibility of making 
allowance in the calculations for departure from the strict truth of this 
or that assumption—a trend in curvature, a uniform increase in the 
variance of y as x increases, and so on. The added complications are 
quite practicable but they are also tedious. Does it really matter that 
the fiducial limits of an assay are evaluated as 13.9 to 21.6, though a 
further two hours of arithmetic would have shown that a better ap- 
proximation was 14.2 to 21.4? Under what circumstances could it 
happen that the action taken on the result of the assay (and there is 
little point in an assay on which no action whatever is to be taken) 
would be altered thereby? And would it be realistic to suggest that 
because both sets of figures are only approximations based on a set of 
theoretical abstractions, therefore no fiducial limits should be computed 
at all and action should be based on the subjective opinions of some 
responsible (or irresponsible) person? Looked at from this point of 
view, the criticism now being rebutted is mere defeatism and tantamount 
to a denial that the pure sciences can be applied to human affairs and 
terrestrial phenomena. 

But it is perhaps necessary to enter a caveat with which this philoso- 
phical—perhaps even metaphysical—dissertation may be closed. The 
approximations must and should be made, certainly, but the errors 
introduced must be random and not systematic, and above all they 
must not introduce bias of a kind arising from the mental characteristics 
and prejudices of the statistician himself. Approximations or not, the 
answers evaluated must be objective, and over a long series of such 
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answers it must not be possible to point to any constant drift away 
from the truth in some particular direction. Provided the deviations 
are random, and the fiducial limits quoted are as often too wide as too 
narrow, the statistician may well reply to his critics that if the action 
taken in all circumstances is consistently based on his reports, in the 
long run a smaller proportion of errors will be committed than if it 
were based on any other kind of criterion. He may then fairly ask 
them whether any other class of persons making reports on which 
action is taken can claim as much! 


SUMMARY 


Bio-assay results are usually summarised in two statements; an 
estimate of potency and the fiducial limits of this estimate. The truth 
of these statements depends on a number of assumptions which always 
have to be made bux are never all stated explicitly when the results of 
the computations are presented. These assumptions fall into two 
groups: 


Section A. Three assumptions must be made if a body of data is to 
be regarded as a fundamentally valid assay. 
Al. The hypothesis of validity of the experimental design. 
A2. The hypothesis of existence of a single-valued dose-response 
relationship. 
A3. The hypothesis of similarity of the test and standard prepara- 
tions. 


Section B. The statistical validity of the computation of the results 
depends on seven assumptions. 
Bl. The assumption of relative precision in the measurement of dose. 
B2. The assumption of existence of satisfactory dose and response 
metameters. 
B3. The assumption of computability of the metameters. 
B4. The assumption of linearity between the dose and response 
metameters. 
B5. The assumption of normality of the response metameter dis- 
tribution. 
B6. The assumption of homoscedasticity of the response metameter. 
B7. The assumption of efficiency in the computational methods. 


Both these sets of assumptions are defined, the extent to which they 
are essential or may be modified is discussed, and their effect upon the 
methods of conducting the assay and the calculations arising from it 
is considered. 
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Section C. The precise meaning of the terms ‘estimate of potency’ and 
‘fiducial limits’ is stated and analysed. 


Section D. Bio-assay depends entirely upon biological research. So 
long as knowledge of the biological and biochemical mechanisms under- 
lying the assay data is incomplete, bio-assay results will always be 
subject to errors which cannot be included in the statements of fiducial 
limits. 

Certain criticisms affecting the legitimacy of statistical calculations 
as applied to bio-assay are refuted. Such calculations represent the 
only common-sense way of dealing with quantitative biological results. 
Before action is taken on the result of a bio-assay, it should always be 
considered how far the assumptions enumerated above are sufficiently 
well-founded in that particular assay to be acceptable for practical 
purposes. 
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A BIOLOGICAL ASSAY OF TUBERCULINS 
R. A. FIsHer 


University of Cambridge 


A FEW YEARS AGO it was decided to carry out under the authority 
of the Agricultural Research Council a specific test, using bovine 
subjects, of the relative potency of two tuberculin preparations, which 
may be designated as Standard and Weybridge respectively. Such a 
test constitutes essentially a biological assay of the tuberculins, and a 
report of its results may be of interest, since little seems to be known 
of the statistical problems involved in the use of the tuberculin reaction 
for such a purpose. 

For the test ten herds in different parts of England were used, 
and from each, twelve cows were chosen and assigned to four treatment 
groups, each group thus receiving three cows of each of ten herds. The 
groups differed only in the sites at which the tuberculin was applied. 
The treatments applied at each site were 


Standard 0.1 mgm. 
Standard 0.05 mgm. 
Weybridge 0.05 mgm. 
Weybridge 0.025 mgm. 


yawn 


The sites of application, four on each side of the neck, were num- 
bered from one to eight in such a way that numbers five to eight on 
the left side corresponded with numbers one to four on the right. At 
each site, the measurement made was a thickening of the skin observable 
in a set number of hours after intradermal injection of the tuberculin. 
The treatments of the four classes of cow are set out in the following 
table: 


Treatment Class 
Sites 
1 3 2 4 
3 and 6 A B i D 
4 and 5 B A D Cc 
land8& C D A B 
2 and 7 D Cc B A 
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The following sections will incorporate parts of the report made by 
the author to the Agricultural Research Council in March 1944. The 
interest of the matter to biometricians lies in the fact that tuberculin 
readings present in an acute form the need to develop ad hoc an ap- 
propriate theory of errors. This need is often present and sometimes 
unrecognised in other types of biological response. The preliminary 
investigations, by which a theory of errors appropriate to these readings 
was built up, may therefore be of assistance to workers with other 
material. 

It will be seen that the method judged by these tests, and verified 
a@ posteriori to be appropriate to the material, is essentially that of x’ 
analysis as ordinarily used with observations of frequency. This was 
adopted not only because it works well, but because it lies ready made 
to the hand of the statistician. I do not think it is the only mode of 
analysis which could have been usefully applied. Indeed the Eulerian 
distribution 


* dx 
having, for variable p, variances proportional to the means and giving 
exhaustive simultaneous estimation based only on the arithmetic and 
geometric means of each sample, would seem to supply an equally 
effective mode of approach and one which it would be of considerable 
mathematical interest to develop. I cannot, however, imagine that it 


_ Should give a different answer to the practical question at issue. 


I should add, what was not known to me when I wrote the report, 
that careful comparative tests with guinea pigs, well designed and of 
high precision, gave in fact a ratio of 0.9 instead of 2.2 for the two 
materials. They must, therefore, in reality be qualitatively different, 
although there is no indication of this within the scope of the bovine test. 

The analysis of the experiment designed to assay the potency of 
Weybridge P.P.D. H, Tuberculin encountered two difficulties: 

(a) That arising from the very great variation in the reaction of 
different cows. This of course had been foreseen as inevitable in un- 
selected material, and it had been proposed that the series of trials first 
carried out should be regarded in one aspect as a means of selecting 
animals of uniformly high reactivity, a panel of which could be used for 
a more accurate assay. 

As this had been found impracticable, it was necessary to utilize 
data involving the full variation in reactivity of unselected material. 
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(b) It was anticipated that equivalent reactions would be obtained 
. from like sites on opposite sides of the same neck. The data available 
- from the repetition of the test on 120 cows in all show that no such 
similarity is to be relied on, but that significant differences between 
Right and Left occur, and that these differences are strikingly different 
at the four chosen sites. In fact the data have to be examined as if 
each of the eight test-points on each animal had its-own characteristic 
sensitivity. 

In consequence of these two drawbacks the methods of reduction 
which we had hoped to use appeared on examination to be quite in- 
adequate for the purpose of combining the information available from ; 
the different parallel sets of animals. In forming a judgement as to the 
manner in which such factors as tuberculin-potency interact with the 
differences in sensitivity of different animals and of different sites, the 
most valuable information is provided by the fact that on each animal 
certain pairs of sites, namely 1 and 8, 2 and 7, 3 and 6, 4 and 5, are in- 
variably treated alike, although the actual treatments used on these 
rc pairs of sites are varied for animals of the four different classes. 


ss j Preliminary analysis seemed to indicate that the difference in re- 
e sponse at two sites on different cows was proportional to the sensitivity 
a of the cows, and that the difference in response to different treatments 
was proportional to the general average response which such treatments 
provoke. This, so far as it may prove to be true, is a most valuable 
generalisation. Together with a second observation, namely that the 
variance to be ascribed to any observation, whether owing to the : 
individuality of the animal or to errors of measurement, is approxi- , 
mately proportional to the magnitude of the measurement to be ex- 
pected, it does allow of a rational and comprehensive form of analysis. 
To demonstrate the approximate truth of these views, the animals 
in each class were divided, using the total reaction at 48 hours, into 4 
groups of reaction-intensity: thus, of the 30 animals in class 1, four, 
giving reaction at 48 hours of 0-19mm. in all at the 8 sites, are of the 
K lowest class of reactors (a), nine give reactions of 20—49mm., i.e. on 
j the average 214 to 6mm., (class 8), eleven give total reactions of 
} 50-79mm. i.e. 6-10mm. on the average, (y), and six give total reactions 
| of 80 or more, (4). 
i Taking, for example, sites 1 and 8, which with these 30 cows both 
| receive treatment C (Weybridge 0.05mgm), if a and b are the measure- 
| ments observed at any stage, e.g. 72 hours, we can calculate [(a — b)*]/ 
(a + b) for each cow, and for any group of cows [(A — B)*]/(A + B) 
where A and B are the sums of a and b. Then for variation in the ratio 
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of measurement at site 1 to that at site 8 among the 4 cows of the 
lowest sensitivity-class, one has three degrees of freedom, yielding 


s{ 6.7777 

] AGB = 2.8824 

leaving 3.8953 


as the contribution of these three degrees of freedom. Since there are 
three other pairs of sites equally comparable on each cow, also with 
nearly equal total reaction, one can in this way make up 12 degrees 
of freedom, obtaining the total of 6.9948, measuring variation of the 
same sort within homogeneous material. The three other classes of 
cow, in which these same sites receive treatments, A, D, and B re- 
spectively, bring up the total degrees of freedom to 108, with a total 
sum of squares of 40.8580, and a mean-square measured in this way for 
the least responsive class of cows (a) of 0.38mm. 

The point of this procedure is the comparison it allows between 
cows of very different absolute sensitivity. For the four classes of 
cows chosen one has the results shown below: 


Reactivity-class | Degreesof Freedom| Sum of Squares Mean-Square 


a, 31 cows 108 40.8580 0.3783 
B, 43 cows 156 70.0597 0.4491 
y, 35 cows 124 50.8053 0.4097 
5, 11 cows 28 14.3365 0.5120 

120 cows 416 176.0595 0.4232 


Measured in this way, therefore, the gross heterogeneity between 
cows of different sensitivity-classes has practically disappeared, and the 
contributions of unequal numbers of cows in these classes to the evi- 
dence may be satisfactorily weighted. Further it appears that the 
ratio of reaction-measurement at two comparable sites is nearly the 
same whatever treatment these sites receive. For each sensitivity-class 
of cow, twelve degrees of freedom have been excluded from the analysis 
above, representing possible differences of this kind. For the four 
sensitivity-classes, these are: 
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Class Degrees of Freedom} Sum of Squares Mean-Square 
a 12 6.5598 0.5466 mm. 

8 12 5.9111 0.4522 

Y 12 1.7141 0.3862 

12 12.3481 1.0290 
48 26.5331 0.5528 mm. 


Apart from the slight suggestion that in the most sensitive class of 
cows some heterogeneity in the site-ratio has been introduced by varying 
the tuberculin used, these figures show that there is little danger of 
being misled if the data are treated as though the ratio of the response 
at different sites, both in different cows and to different tuberculins, 
were a constant property of those sites. This is important, since of 
the four pairs of sites treated alike, three (namely 1 and 8, 3 and 6, 4 
and 5) all show significantly unequal response in the aggregate examined. 
Finally not only is the variation homogeneous within groups of cows 
showing very varying sensitivity to tuberculin, but the ratio of re- 
sponse in the four classes of cows chosen for their different sensitivity 
is also the same. For this we have three degrees of freedom for each 
pair of sites, or twelve in all: 


VARIATION AMONG DIFFERENT SENSITIVITY-CLASSES a, 8, y, 6 


Degrees of Freedom | Sum of Squares Mean-Square 


12 | 5.0866 0.4239 


On the basis of this preliminary investigation, which has been set 
out in detail above for readings at 72 hours, the problem of estimating 
the proportionate increase in swelling measured produced (a) by doub- 
ling the quantity of tuberculin, and (b) by replacing a given amount of 
Standard tuberculin by half the quantity of Weybridge 10, becomes 
tolerably straightforward. 

The method used in the original report, although substantially accu- 
rate in the results it gave, was not well suited as a methodological model, 
and may be replaced for our present purposes by one of equivalent 
accuracy and perhaps greater clarity. 

Taking, for example, the data for readings at 48 hours, and adding 
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together readings at the two sites treated alike and on the 30 cows 
treated alike, the aggregate vesults of the test may be expressed by the 
following 4 X 4 table, to which is appended on the right a key to the 
treatments used in the form of a non-cyclic Latin Square. 


Cow Class 
Sites I III II IV 


3+6 454 249 349 249 1301 
4+5 408 322 312 347 1389 
+s 523 268 411 285 1487 
3+7 364 283 266 290 1203 


m WAS 


1749 +5380 


Treating these aggregate measurements as quasi-frequencies, the data 
now have a form closely similar to that which arises with a three-point 
linkage test in genetics, in which also we have 16 observable frequencies, 
classifiable in three orthogonal categories, assigned arbitrarily to the 
rows, columns and letters. In such a case, for example, we have typi- 
cally four different triple heterozygotes used as parents and assigned 
to the four rows, four modes of gamete formation (crossover classes) 
assigned to the four columns, and four pairs of complementary genotypes 
distinguishable associated with the four letters of the square. 

If, as sometimes happens, these pairs of complementary genotypes 
are not equal in viability, the frequencies to be expected in the sixteen 
entries will be affected not only by factors representing modes of gamete 
formation and abundance of material from the four possible sources, 
but by a third unknown set of factors representing relative viabilities. 

The statistical problem will then consist in assigning sixteen ex- 
pectations to the sixteen cells of the table, each expectation being the 
product of three appropriate factors, all of them unknown. 

An examination of this statistical problem shows that the solution 
of maximum likelihood is such that the sums, by rows, by columns, and 
by letters, of the expectations are equal to the corresponding sums of 
the observed frequencies. This is a statistical solution of the utmost 
simplicity, although the algebraic problem of constructing expectations 
fulfilling these marginal conditions, and the condition of being triple 
products, seems to be one of some intricacy. I have elsewhere discussed 
certain approximate methods of approach.’ 

The tuberculin data are in one respect slightly simpler than the 


IR. A. Fisher (1949) Note on the test of significance for differential viability in frequency data 
from a complete three point test. Heredity 3, 2, 215-219. 
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corresponding genetical problem, for in this case it is to be presumed, 
unless the data indicate otherwise, that the effect of doubling the dose 
is the same whichever of the two tuberculins is used, i.e. that the factors 
corresponding with the letters A, B, C and D shall be in proportion. 
This circumstance opens the way to an effective approximate estimate 
of these factors. 

It will be noticed in the symbolic square that the four quarters are 
constituted by 2 x 2 Latin Squares such as 


A B 
B A, 


so that the ratio A : B, representing the ratio of the readings for double 
and single injections, can be consistently estimated from the product 
ratio of the four observed total measurements, i.e. from 


+/454 322/408 x 249 


In practice it is most convenient to work with natural logarithms, 
so that we have 


Treatment Total Measurement | Natural Logarithm 
A 454 1.51293 
B 408 —1.40610 
A 322 1.16938 
B 249 — .91228 
. 36393 
A:B8 . 18196 


The weight of this logarithmic estimate is (“The Design of Experi- 
ments’, Section 70) the harmonic mean of the four frequencies, namely 
339.69. 

Taking in turn the three other similar included 2 X 2 squares, and 
remembering that the ratio C : D is to be presumed equal to that of 
A : B, we have, in the four cases 


Log Ratio Weight 
. 18196 339.69 
A: 8 . 22624 304.19 
. 20844 335.45 
.22197 308.44 
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from which we have the estimate of the weighted mean .20890, for the 
effect expressed as the natural logarithm of the measurement of doubling 
the tuberculin dosage. 


It may also be seen that four more 2 2 Latin Squares, in this 
case overlapping, are available to estimate the ratio A :C or B : D for 
which, using again natural logarithms, we have the estimates 


Log Ratio Weight 
01102 424.94 : 
— 02517 308. 42 
B:D — 02269 328.87 
B:D 03075 261.91 


the weighted mean being in this case —.00188. It will be noticed at 
this stage that the estimates are showing a remarkable consistency. 


The two estimations carried out above answer the practical question 
of the enquiry by assigning relative performance to the single and 
double doses of the two tuberculins used and show, in fact, that the 
Weybridge material was effectively a little more than twice as potent 
as the Standard. Questions of precision can, however, only be answered 
by constructing the expectations corresponding to the measurements 
observed. An approximate method of doing this, appropriate to cases 
like the present, in which all cells of the square are well occupied, is 
shown below. 


We have the measurements of logarithmic relative potency 


B Standard single 0.0000 
A Standard double 0.2089 
D Weybridge half 0.0019 
Cc Weybridge single 0.2108 


The antilogarithms of these give factors appropriate to the four treat- 
ments. Dividing the observed frequencies by these factors we have the 
adjusted frequencies 


368. 112 282.673 218.532 1148.617 
108. 261.297 3LL.418 281.053 1261.763 
123.604 267.196 333.518 285. 1309 .618 
363.316 229.216 266. 235.329 1093. 861 


1007 .009 1193. 604 1019 4813 .859 


1563 332 
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From the margins we can reconstruct the table so that the rows 
and columns are in strict proportion 


373.0208 240 .2787 284.8014 250.5161 
409.7657 263 .9476 312.8561 275 . 1936 
425.3070 273.9584 324.7219 285 . 6307 
355 . 2385 228 . 8243 271.2246 238 .5736 


| Each value may now be multiplied by the appropriate treatment factor, 
% so as to give an approximate expectation. 


459 .6817 240.2787 351.6274 250 . 9876 1302 .5754 
409.7657 325 . 2684 312.4430 339.7652 1387 . 2423 
525.1 13 274.4740 400. 1619 285 . 6307 1485 .3679 
355.9071 282.5158 271.2246 293 .9995 1203 .6470 


1750. 4558 1122.5369 1335 . 4569 1170.3830 5378 .8326 
Since these do not give exactly the original total, they may be reduced 
to the correct total, as in the following table. H 

459.781 240.331 351.704 . 251.042 1302. 858 —1.858 


409.855 325.339 312.511 339.839 1387 .544 +1.456 
525.215 274.534 400.249 285.693 1485.691 +1.309 


355.984 282.577.271.283 294.063 | 1203.907 —0.907 
| 1750.835 122.781 1335.747 170.637 | 5380.000 
| 1749 1122 1338 1171 

—1.835 -0.781 +2.253  +0.363 


The marginal totals of this table of expectations, although not exactly 
equal to those of the observations on which they are based, are good ap- 

. proximations to these. Thus the column totals each of about 1300mm. 
have discrepancies — 1.8, —0.7, +2.3, +0.4mm. only. With the rows 
the largest discrepancy is only —1.9mm., and with the letters (treat- 
ments) we have 


Expected Observed 
A 1479. 432 1477 —2.432 
i B 1207 . 162 1208 +0.838 
Cc 1499 .335 1503 +3.665 
D 1194.071 1193 —1.071 
| 


Thus our method, though only tentative and approximate, can be 
seen after the event to have given a very satisfactory approximation to 
the ideal fitting required. Owing to the importance of this type of 
problem in genetics, and the probability of further analogous cases in 
biological assay, the problem of making a sufficiently rapid and sufti- 
ciently accurate fitting of this kind seems to deserve further study. 
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Given sufficiently good expectations, we can calculate the in- 
gredients (a — m)*/m, the sum of which supplies the analogue to x? 
for the residual seven degrees of freedom. 


(a — m)?/m 


.0093 .1555 . 2888 .0017 

. 1805 .0006 . 1029 .0561 

.0727 .3127 .0208 .0166 

.0084 .0343 .0008 . 1509 
.2709 .5031 .4133 . 2253 1.4126 mm, 

d.f. m.s. 
| | 
7 | 1.4126 .2018 mm. 
12 | 2.92564 | mm. 


In millimetres this comes to 1.4126, with a mean square .2018mm. only. 
This value is in good agreement with that obtained by contrasts be- 
tween the aggregate readings on sites treated alike, which for twelve 
degrees of freedom gives 2.92564, or .2438mm. as the average value. 

These values are rather surprisingly less than those obtained directly 
in the preliminary test set out above. The indications of precision 
available from individual readings were, therefore, recalculated more 
exactly, treating each set of three cows of the same herd and treatment 
as a3 X 8 frequency table, giving 21 degrees of freedom within the 
herd, and each set of trios, one from each of ten herds for a given treat- 
ment, thus supplying 63 degrees of freedom between herds. Owing to 
a few cows giving completely zero readings, we have not quite the full 
number of degrees of freedom available, but using the same readings, 
i.e. at 48 hours, as those used in the illustration above, we have 


144.1697 mm. .5712 mm. 


Degrees of Freedom | Sum of Squares Mean Square 
| 
Within herds 539 | 242.7791 mm. . 450-4 mm. 
Between herds 252 | 


It is a puzzling feature, and one that I do not understand, that the 
comparisons used in the final aggregates should agree so appreciably 
more closely than do the individual readings on which they are based. 

The ratio of potency of equal weights of the two tuberculins were 
estimated, for the readings at the three periods used, to be as follows: 
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Period 48 hours 72 hours 96 hours 
Estimated ratio 2.009 2.141 2.172 


PERCENTAGE INCREASE IN MEASUREMENT (MEASURED LOGARITHMICALLY) 
ESTIMATED INDEPENDENTLY FOR EACH HERD 


TABLE 1 


48 hrs. 


72 hrs. 


DOUBLE v. SINGLE DOSE 


30.0 28.2 30.4 
27.6 20.5 18.4 
17.8 21.8 20.7 
18.9 17.3 15.7 
17.5 15.3 — 2.0 
15.2 6.9 — 2.7 
Weighted mean ........ 21.1 21.5 16.4 
WEYBRIDGE v. STANDARD 
TUBERCULIN. 
6.8 7.5 11.2 
ESS, 11.3 7.5 8.4 
11.3 3.7 0.9 
CamividesB 2.6 4.2 6.8 
SY — 3.0 5.9 —10.0 
0.48 3.19 3.03 
Relative potency of equal weight of 
Tuberculin. 
Weybridge v. Standard, 2.032 2.217 2.274 
with fiducial limits 2.341 2.505 2.727 
1.764 1.961 1.897 
Estimate from aggregated data . . 2.009 2.141 2.172 
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To examine the consistency of the differential responses on which 
these estimates are based, and to obtain an appropriate standard error 
and fiducial limits for the estimates, a parallel process was applied to 
the ten constituent herds individually. (The original report then dis- 
cusses individual herds in detail at the different periods at which the 
swellings were read.) The herd values are shown in Table 1. It is 
upon these that the fiducial limits have been based. 

Table 2 gives the relative performance at the eight sites. Of these 
the most forward (1, 5) are the most sensitive, and the hindermost (3,7) 
are least so. It is obvious that there is no consistency in the differences 
between Right and Left. 


TABLE 2 
PROPORTIONATE RESPONSE AT EACH SITE 
48 hours 72 hours 96 hours 
Site 
1 1.141 1.131 1.143 
2 .924 .931 .925 
3 .931 .894 .892 
4 .993 .990 1.017 
5 1.100 1.145 1.107 
6 .975 .990 .959 
.895 .928 .930 
8 1.040 1.029 1.027 
7.999 8.000 8.000 


The complete data from the experiment are shown in Table 3. 


SUMMARY 


The above details and the result of the experiment reported have 
been published at the present time: partly in illustration of the fact 
that each type of reading which arises in biological assay deserves and 
may require the development for it of an appropriate theory of errors; 
secondly because previous work with tuberculin readings seems to have 
given no idea as to how they can be quantitatively interpreted; and 
thirdly because the precision of such readings regarded as a biological 
assay seems to have been much underrated. 
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TABLE 3 


TABLE OF INDIVIDUAL RESPONSES AT 48, 72, 96 HOURS 
(Data relate to 10 farms, 4 treatment classes at each, 3 cows per class). 
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Treatment 
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TABLE 3—Continued 


TABLE OF INDIVIDUAL RESPONSES AT 48, 72, 96 HOURS 


(Data relate to 10 farms, 4 treatment classes at each, 3 cows per class). 
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Treatment 
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TABLE 3—Continued 
TABLE OF INDIVIDUAL RESPONSES AT 48, 72, 96 HOURS 
(Data relate to 10 farms, 4 treatment classes at each, 3 cows per class). 
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ON A ONE-DIMENSIONAL DIFFUSION METHOD OF 
ASSAYING ANTIBIOTIC SUBSTANCES AND ITS 
FUNDAMENTAL FORMULAS 


Morosasuro MasuyaMA 
Hygieno-meteorological Laboratory, Central Meteorological Observatory, 


Tokyo. Institute of Physical Therapy and Internal Medicine, Tokyo Uni- 
versity. Institute of Statistical Mathematics, Department of Education. 


1. INTRODUCTION 


The cylinder-plate method or the Oxford method for penicillin assay 
is a direct extension of Fleming’s original finding and it is supposed to be 
more accurate than the ordinary dilution method. However, it needs 
relatively large amounts of ordinary nutrient agar and of penicillin 
solution, and it is not suitable, for example, for the estimation of penicillin 
concentration in serum. 

As a quantitative micromethod of assay, the present author and his 
colleagues, Dr. Torii and Dr. Kawakami, have devised ‘‘a one-dimen- 
sional diffusion method” or ‘“‘a small test tube method” which is called 
in Japan ‘“Zyds6h6” (literally speaking ‘‘a superposition method”). In 
this paper the author deduces a fundamental formula for this method of 
assay. The experimental technique and the results will be published in 
detail elsewhere in Japanese by Torii and Kawakami (1). 


2. METHOD 


In the original form of the one-dimensional diffusion method a capil- 
lary tube was filled with an inoculated agar and its end was inserted in 
an ampule which contained a solution of penicillin. The test organism 
was Staphylococcus aureus (F.D.A. 209-P). The composition of the 
ordinary nutrient agar was as follows: 


beef infusion 1000cc 

NaCl 5g 

pepton 10g 

agar log 
(pH = 6.5). 
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The experimental results showed that only a small amount of the 
nutrient agar and of the specimen was needed but the technique was not 
so easy and the growth of bacteria not so good as was expected. To 
avoid technical difficulties Torii and Kawakami used small test tubes 
with cotton plugs instead of capillary tubes and ampules. The length 
and the diameter of the tube is H = 75mm and D = 4.5mm (‘Murata 


test tube’’) or H = 88mm and D = 8.0mm (“small test tube’’) respec- 
tively. 


To get better results, there are at least three ways, i.e., 


(i) to use an anaerobe or a facultative anaerobe as the test organism 
with suitable oxygen donator or growth promoting substances 
(ii) to use a more sensitive strain of bacteria 
(iii) to use a color indicator as is used in the medium of Endo. 


In our Institute the following two methods are used 


(i) Streptococcus hemolyticus, Murata test tube and medium of 
following composition devised by Torii: 


ordinary nutrient agar 100cc 
defibrinated goat blood 10cc 
24 hours 1% blood beef broth culture 0.05ce 


(ii) Staphylococcus aureus, Escherichia coli, Shigella paradysenteriae 
or Ebertella typhosa, small test tube and medium of following 
composition devised by Kawakami: 


ordinary nutrient agar 100cc 
1% NaNO; solution 0.5c¢ 
0.1% methyleneblue solution 3.5¢¢ 
24 hours beef broth culture 0.2ce 


At first the melted inoculated agar is put into the test tube (approxi- 
mately 0.5cc for Murata test tube and 2.5ce for small test tube). After 
the agar has hardened, the solution of antibiotic substance is superposed 
on it. The amount of solution is approximately one third (for the 
Murata test tube) or one sixth (for the small test tube) of that of inocu- 
lated agar. For ordinary purpose three tubes which contain the same 
doses are necessary to control random errors of assay which are mini- 
mized when the temperature distribution in the incubator is homogene- 
ous. The test tubes are incubated 16 hours at 37°C. 

The lengths of inhibition zone are read at least to the nearest 0.5mm. 
In general there are two frontal surfaces in one tube, i.e., the front of the 
bacterial growth and that of the hemolysis or the decoloration of the 
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indicator. We usually read the bacterial front of streptococcus hemo- 
lyticus and the front of decoloration of other bacteria. Sometimes the 
front of decoloration is vague, especially when the tested solution is a 
culture filtrate. In such a case the deep colored ring in the colored region 
is used as a front. If we use a more diluted suspension of bacteria and a 
more concentrated indicator, the front of hemolysis or decoloration, 
which is formed by the diffusion of the active principles produced by 
inoculated bacteria, approaches the bacterial front. According to 
Kawakami’s experiments carried out recently it would be better to use 
0.02ce of 24 hours beef broth culture and 3.8cc of 0.1% methyleneblue 
solution. Kawakami’s medium should not be exposed in the direct 
daylight, since the methyleneblue acts as an antibiotic substance inthe 
daylight and the leuco base of methyleneblue is oxidized by the daylight. 

All antibiotic substances except Tapecilline (the commercial name 
of a sort of culture filtrate of Penicillium) used by our colleagues are highly 
purified, but according to our friends’ private communication our method 
is applicable in the factory for crude extracts with or without slight 
modifications (low concentration of bacterial suspension, other color 
indicator). 


3. EMPIRICAL FORMULA 


Let the concentration of the antibiotic substance and the length of 
inhibition zone be C and y respectively, then there exists an empirical 
formula 


(3.1) y= 


where z = log C. 
The following limiting results are apparent: 


(i) y=G, for 


(iii) y = 0, for zr=a. 


This formula holds well for penicillin and streptococcus hemolyticus 
or staphylococcus aureus for the range 0.0244 to 200 units per cc. and 
for patulin, bromsalicil or tapecilline and other bacteria above cited. 
The formula is valid for the front of hemolysis or of decoloration or ring; 
the numerical values of r and G are nearly equal to each other but they 
are different from the value of a. 
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TABLE 1 


THE LENGTHS OF INHIBITION ZONES IN MM AND THE CONCENTRATIONS 
OF PENICILLIN SOLUTION IN U/CC. (FIGURE 1) 


B... bacterial front H ... front of honolysis 
10 10/4 10/42 10/48 10/44 
B H B H B H B H B H 
21.5 20.8] 17.9 16.8 | 14.7 18.8) 10.0 9.1] 3.5 3.0 
21.2 20.1 | 18.4 17.5] 14.8 138.6] 9.9 9.6] 4.3 3.7 
21.8 21.0] 18.2 17.2] 14.1 13.4)10.3 9.5] 4.2 3.5 
mean 21.50 20.63) 18.17 17.17) 14.53 13.60) 10.07 9.40} 4.00 3.40 


To test the goodness of fit, we use the following equation of finite 
differences. Let w be any given constant, and eliminating the unknown 
parameter a from the following equation (3.2) and from the previous 
one (3.1) 


32) y(x + w) Gil 
we obtain an equation of finite differences 
(3.3) y(x a w) y(x) or] 


The latter equation shows that though we do not know a priori the 
numerical values of three parameters in (3.1), we know that there should 
exist a linear relation between y(x + w) and y(x), i.e., plotting y(x + w) 
against y(x) we have points on a straight line. From this relation we can 
estimate at first or rand then G[l — e or G. Substituting these 
values in (3.1), we can estimate the numerical value of the location 
parameter a. 

To estimate the potency of an unknown solution compared with a 
standard, the fact that (3.3) is independent of the location parameter a 
is very important, for the points of the unknown and that of the standard 
should be on the same line in the y(x) — y(x + w) diagram. To estimate 
the potency, we need at least 


(i) three standards of different doses S, , Sy, and S; where the ratios 
of two successive doses are the same (let it be A) and one un- 
known U, or 

(ii) two standards of high and low doses S,, and S; and two unknowns 
of high and low doses U», and U, where both ratios of doses are 
the same (let them be A). 
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y(x*w) 


12F 


FIGURE 1. FINITE DIFFERENCES DIAGRAM FOR TABLE 1 
Penicillin-Strept. hemolyt. 


O Front of bacteria 


@ Front of hemolysis 


a = a 


4 8 12 16 20,., 


Let us use the suffices u and s for the unknown and the standard in the 


following. 


In any assay in which an unknown and a standard are used simul- 
taneously, the potency @ of the unknown is the ratio of the doses of 
unknown and standard that produce the same response. Consider the 
log dose response curve of S and that of U. Displace the curve of U by 


Q to the right so as to place it upon the curve of S, then we have on the 


one hand 

(3.4) a,—-a,=Q 
and on the other hand 

(3.5) log C, — log C, = Q, 
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where 
(3.6) Q = log 0. 


We shall deduce a formula to estimate the potency in the above cited 
cases. 

I. The diameters are here labeled u, and u, for the low and high 
doses of the unknown, and s,; and s, for the low and high doses of the 
standard. The equation of the straight line through two points (uz , Ug) 
and (s, , 8) on the y(x) — y(x + w) diagram, where w = log A, is 


_ Un UrSH — 

+ w) = y(x) + 
(3.7) 
=e + Gl —e"), 


and accordingly we have 


(3 8) = Un 8H 


and 
Ms Bz 


U, — 8, — Un + 


If we transform (3.1) utilizing (3.9), we have 


(B10) — % = — = — 81) 


UrSH — 


Putting z,, = 2,, we have 


Q/w 
Un — UL Un — Sn 


Taking common logarithm of both sides we obtain 


(3.12) 


log 
Un — Uy, 
(3.13) log @ = ———“ log A 
lov S;, — U1, 
Su — Un 


Nomograms which facilitate the calculation of the potency by this for- 
mula have been made by the author and published by the Japanese 
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Penicillin Association, Department of Welfare and Public Health. The 
nomograms consist of two parts, viz., the nomogram for Z = log (P/Q) 
and that for log @ = (Z,/Z.) log A. 

IT. Let the lengths of inhibition zones for the high, middle, and low 
dose of S be h, m, and I respectively and that for U be u. Then by the 
same method of approach we have 


los (G — w(2m —h) 


(m — 1) 
(3.14) log @ = age log A, 
og 
m— l 


— lh 
@.15) ~ 
There is an indeterminate case, where the equation 
(3.16) 2m -l1—h=0 


holds by random errors. This case occurs in a narrow range where the 
log dose response curve could be treated as a straight line. In such a 
case u may be estimated from this straight line. 


4. THEORETICAL FORMULA 


Let u(y, t) be the concentration of the antibiotic substance at a point 
y at a time ¢ and the coefficient of diffusion be D. Then the differential 
equation for one-dimensional diffusion in porous medium is 


2 
(4.1) 


by Darcy’s law. The fundamental assumption of the one-dimensionality 
is based on three experimental facts, i.e., 


(i) the diameter of the test tube may vary in a certain range to get 
sufficiently accurate data, 
(ii) the front of hemolysis or decoloration is nearly even, i.e., there is 
approximately no formation of the meniscus, 
(iii) the minimal effective dose which is estimated by the formula 
deduced from (4.1) is nearly equal to the mean of the estimates 
determined by the ordinary dilution method. 


If the length K of the inoculated agar column in the test tube is 
sufficiently large compared with the length of the inhibition zone, we 
can assume that K is infinite. Let the initial and the boundary condition 
be 


where @ is the ratio of concentration of low S and U, and we put ess 
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FIGURE 2. GRAPHICAL VERIFICATION OF THE FORMULA (4.5), UTILIZING THE 


7 


ESTIMATE k = 0.020 u/ec OBTAINED FROM FIGURE 3. 


¥2 


(4.2) ¢ = 0; u=0 for y > 0, and 


(4.3) y = 0; u=C for t>0 


Then a well-known solution of the equation (4.1) is 


where z = y/+/2Dt. 
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If we let the minimal effective dose and the lag phase be k and + 
respectively, then we have as our basic formula 


k 2 v/f 
(4.5) 1 dt, 


where f =+/2Dr. To estimate k and f simultaneously by the observed 
data, we apply Williams’ approximation: 


The relative error of this approximation is for all ranges of y at most 
0.7%. Utilizing this sufficiently accurate approximation we have 


When 2C is sufficiently large compared with k, we have 
(4.8) log C — log 2k = 2y’/(rf*) = y’/a Dr 


This equation indicates that if y’ is plotted against x = log C, the plotted 
points will be on a straight line. The point of intersection of this line 
and the x or horizontal axis and the slope of the line give the numerical 
value of log 2k or k and that of 2/(zf”) or f respectively. 

Utilizing this theoretical formula we can estimate how the length of 
the inhibition zone increases when the test tube has been stored in a 
refrigerator before incubation, because D in f varies proportionally to 
the absolute temperature (approximately) and 7 increases by duration 
of refrigeration. As the biological or physical meanings of the three 
parameters k, D, and 7 are clear in this case, theoretically speaking, (4.8) 
is better than (3.1) if the former fits the actual data. Formally speak- 
ing, in (3.1) y = 0 corresponds to the minimal effective dose, or in other 
words, there seems to exist the relation 


(4.9) k= eé 


but we have no evidence which indicates the validity of the empirical 
formula for sufficiently small values of y. The uncertainty of such an 
induction may be easily seen in (4.8), for if we take (4.8) as an empirical 
formula, y = 0 corresponds to C = 2k, i.e., the estimated minimal 
effective dose is twice as large as the true one. 

To test whether or to what extent this formula holds, it is desirable 
to dilute the solution in question in geometric progression and then the 
x are in arithmetic progression and accordingly the corresponding y* 
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FIGURE 3. log C — y? DIAGRAM 
Penicillin—Strept. hemolyt. 
Upperfand lower curve correspond respectively y2 and yi of the Table 2. 
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yi... the solution is immediately superposed 
y2... the solution is superposed after 24 hours refrigeration 


TABLE 2 


THE MEAN LENGTHS OF INHIBITION ZONES IN MM AND CONCENTRATIONS 
OF PENICILLIN SOLUTION IN U/CC. 


(FIGURE 2 AND 3) 


Y2 y/f 

100 23.9 28.6 3.72 
100/4 18.1 27.5 3.35 

100/42 16.7 22:7 2.95 
100/48 14.2 18.7 2.49 
100/44 10.4 14.9 1.95 
100/45 6.97 9.23 1.27 
100/46 1.97 0.23 


are also in arithmetic progression. As the actual tube has finite length, 
there is a systematic deviation for large values of y, especially when we 


use Murata test tubes. 


tubes. 


In such a case we should use the longer test 


This formula is valid for penicillin and streptococcus hemolyticus 
or staphylococcus aureus, for patulin and staphylococcus aureus, eber- 
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tella typhosa or shigella paradysenteriae and for tapecilline and the 
various bacteria cited above. 

To estimate the potency of an unknown solution U absolutely by the 
formula (4.8), dilute the solution U in geometric progression with a 
common ratio A, then the first differences of y’ should be constant within 
a certain range where the assumptions hold. Let the concentration of 
the original solution be W, and then that of the (n + 1)st solution is 
C, = WA™. Now the equation (4.8) gives 


(4.10) log C, — log 2k = log W — log 2k —n log A = 2y°/xf? 


where y, means the length of the inhibition zone for C,. Plotting y? 
against n, we can estimate log W — log 2k or W/k, i.e., the multiple of 
the minimal effective dose. 

According to the experimental data which were made by my col- 
leagues using sodium penicillin G, the numerical values of k estimated 
by (4.8) for staphylococcus aureus (F.D.A. 209-P) lay in a narrow range 
which contained k = 0.02 units per cc; the estimates based on the means 
of readings of three test tubes, distributed in a range from 0.015 to 0.030 
units per cc. The estimated values of k were approximately constant 
within a series of experiments done simultaneously, but were slightly 
different from day to day, even though each set of data agreed very well 
with the theoretical curve. The reason for such a fluctuation is not clear 
at present. According to our theoretical formula the concentration of 
agar or the variation of lag phase due to various thermal conditions 
might vary the numerical value of f but should not vary that of k. 
There remain at least two possibilities: 


(i) k may vary under various conditions 
(ii) the front may move after its formation. 


To estimate k absolutely, the front of decoloration may not be suitable, 
for it is formed by the diffusing active principles. In routine work, it 
would be convenient to use semi-logarithmic paper in which a square 
scale is taken along the vertical axis. Then research workers may learn 
the potency without any calculation by plotting y against C. 

To estimate the potency of an unknown solution U compared with a 
standard solution S, we can use the well-known formula 


Us, — Sy + — 
Su — Si + Ug 


(4.11) log 0 = St ~ log A, 


where the squares of the measured diameters are labeled U,; and U, for 
the low and high doses of the unknown, and S, and Sy, for the low and 
high doses of the standard. 
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Finally we want to consider the meaning of the empirical formula, 


compared with the theoretical one. Transforming our empirical formula 
(3.1), we have 


(4.12) y= G{l — exp {-—r(log C — a)}] = G[l — 
and accordingly 


(4.13) 1 - ( 


where we put log k = a, assuming the validity of (4.9). Comparing 
(4.5) with (4.13) we can conclude that the empirical formula is based on a 
parabolic approximation of the probability integral, i.e., 


To know the order of approximation we let the left side of (4.14) be 
p(y/f); then if both sides of (4.14) are exactly equal to each other, the 
equation 


(4.15) t=y/f 
should hold for every value of ¢. Transforming (4.15) we have 
(4.16) 


which has the same functional form as (3.1), and we can test the validity 
of the equation (4.16) by the calculus of finite differences. 

The exact solution of the differential equation (4.1) under the more 
plausible initial and boundary conditions, 


t=0 u = 0, 
(4.17) y=0 (b>0), 
y=K du/dy = 0, 


has been obtained in the form of a Fourier series. However, we can 
hardly utilize this form of solution, because the unknown parameters are 
included in each term of the series. 


5. CONCLUSION 


The author and his colleagues, Dr. Torii and Dr. Kawakami, have 
picked up the essential part of the cylinder plate method of assaying 
penicillin and devised a new method of assaying antibiotic substances. 
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The basic idea is to utilize the one-dimensional diffusion with suitable 
indicators. The empirical formula (3.1) and the theoretical one (4.8) 
fit very well the data obtained by Torii, Kawakami, and Kozima. Even 
though formulas (3.1) and (4.5) hold for a wider range, it is impossible 
to estimate the potency without any standard. To estimate the potency 
without any standard, formula (4.8) is useful, provided that the inequal- 
ity 2C > k holds and that K is sufficiently large. Our superposition 
method of the absolute measurement of the potency seems to be better 
than the ordinary dilution method. In the dilution method only two 
successive test tubes are used to estimate the potency and other test tubes 
remain unused. Furthermore, it has one serious defect, i.e., in the dilu- 
tion method we must determine the point of contact of the dose response 
curve on the dose axis. It is well-known that this is very difficult and 
inaccurate. In our method we use all sufficiently accurate readings of 
all the test tubes without turbidimeter or colorimeter to estimate the 
potency, and we can estimate the error of estimated potency by utilizing 
the analysis of covariance. However, this method of assaying antibiotic 
substances might not be suitable for the electro-positive substances, for 
the latter are adsorbed by the electro-negative agar and the length of 
the inhibition zone is too short to estimate its potency. In such a case 
it would be desirable to use their neutral or electro-negative double salts. 

It is worth noting that with a slight modification of the interpretation 
of the parameters the formulas might be applicable in wider field of 
assaying chemicals and biological products. In fact, the author and 
Dr. Okawara have applied formula (4.8) to estimate the potency of the 
pepsin solution, using the edestin-sodium chloride agar and the pepsin 
solution in place of the inoculated agar and the penicillin solution. In 
this case the length of the digestion zone y increases proportionally to 
the square root of the time of reading 1, (at least within several hours), 
which is expected naturally from (4.8). 

The theory of errors in the estimation of potency will be developed 
in the near future, being based on the observed data. At present to 
estimate the potency absolutely the analysis of covariance method is 
applicable. The classical method of approach, i.e., the theory of large 
samples as is used in Knudsen & Randall’s paper (2) might not be 
suitable for small samples used in routine work. 
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ROUTINE COMPUTATION OF BIOLOGICAL ASSAYS 
INVOLVING A QUANTITATIVE RESPONSE 


M. J. R. HEALy 


Rothamsted Experimental Station 


SUMMARY 


A NOMOGRAM is described for rapid computation of routine biological 
assays using a 6-point design. 


INTRODUCTION 


Many biological assays involving a quantitative response are based 
on a straight line relationship between response and the logarithm of 
the dose. 

A common design for such assays is the 4-point design, in which two 
doses of the unknown preparation are compared with two doses of a 
standard. Several writers have discussed the routine computation of 
such assays, and have constructed nomograms for the purpose (Knudsen, 
1945; Knudsen and Randall, 1945; Bliss, 1946). 

In a general survey of assay problems, Finney (1947) has pointed 
out that the 4-point design does not give a test of linearity of the dose- 
response curve, and hence of the assumptions on which the assay is 
based; he recommends the use of at least three levels of each preparation, 
giving rise to a 6-point assay. When a large number of these assays 
are being carried out, it is often possible to standardise the experimental 
procedure so that the spacing between doses and the number of replicates 
at each dose remain constant; in this case the computation is much 
reduced by the use of the nomogram described below. 


COMPUTING PROCEDURE 


The doses of both preparations should be related by a constant dilu- 
tion factor J, giving equal spacing on the log. scale. It is supposed that 
n readings are taken at each dose. 
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TABLE 1 
COMPUTING SHEET FOR 6-POINT ASSAY 
Standard Unknown 
0 1 2 0 1 
48 60 84 56 76 
40 76 84 56 76 
46 62 84 44 77 
Sum So = 134 Si = 198 S2 =z 252 Uo = 156 U, = 229 U2 = 270 
Range 8 16 0 12 1 


Total Range R = 41 Dilution Factor J = 1.5 


S2+ Si + So = +584 U2+ U, + Uo = +655 
So = So = +118 U2 —_ Uo = +114 
Se — 28; + So = —, 10 Uz, — 2U; + Uo = be 32 
Materials D= (U2 U; + Uo) (S2 Si + So) = 71 
Slope B= (U; — Uo) + (S: — So) = +232 
Parallelism P= (U2 Uo) (S2 So) 4 
Curvature i H = (U2 ne 2U, + Uo) + (S2 = 2S, + So) = — 42 
Curvature ii K = (Uz, — 2U,; + Uo) — (S: — 28S; + So) = — 22 
29.9 51.7 (5% level) 


D/B = + 0.306 D/R = + 1.73 
Relative potency = 1.087 — 1.180 — 1.294 (5% limits) 


A suitable computing sheet is shown in Table 1. The responses are 
entered on the sheet and their sums and ranges are found. The ranges 
are totalled to give a value R and the sums are combined, using ‘factorial 
coefficients” (Emmens, 1948, p. 92), as described on the sheet. For the 
assay to be considered a valid one at a given level of significance, the 
last three totals must not numerically exceed certain values; the limit 
for P is t,R, and the limit for H and K is t,R where ¢,, t#, are factors 
tabulated, Table 2, for significance levels of 5% and 1%. 

Provided the assay proves to be valid, the estimate of relative 
potency is found from the nomogram shown in Fig. 1. The quantities 
D/B and D/R are calculated and the corresponding points marked off 
on the scales AA’ and BB’. The points are joined, and the relative 
potency with its limits of error can be read off from the scales on OO’, 
PP’ and QQ’. 

In this scheme, the precision of the assay is determined by the use 
of range in place of standard deviation, in order to avoid computing 
sums of squares. The resulting estimate of error is not the best possible 
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TABLE 2 
FACTORS FOR ASSESSING THE VALIDITY OF THE ASSAY 
ts 
n 
5% 1% 5% 1% 
2 0.994 1.520 i7 2.63 
3 0.730 1.034 1.26 1.79 
4 0.674 0.926 1.17 1.61 
5 0.657 0.894 1.14 1.55 
6 0.654 0.883 1.13 1.53 
7. 0.659 0.887 1.14 1.54 
8 0.666 0.894 1.15 1.55 
9 0.677 0.902 1.17 1.56 
10 0.685 0.914 1.19 1.58 
23,* 
6 
4 tos 
4 
12] we 
o9- os 
O8+-04 
4 
o7= O7 
$-08 
an 
P 
FIGURE 1. 


NOMOGRAM FOR 6-POINT ASSAY 


n = 3; 


I = 1.5; 


5% limits. 


LOCATE D/B ON AA’ AND D/R ON BB’. JOIN, AND READ RELATIVE POTENCY ON 00’ 


AND LIMITS OF ERROR ON PP’, QQ’. 


; 
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one except when » = 2; in addition, it may be biased if the underlying 
law of variation is not normal. These drawbacks must be set against 
the added convenience of the use of range. They are unlikely to be of 
practical importance when n is small, as will usually be the case. 


NUMERICAL EXAMPLE 


Table 1 gives the results of an assay of nisin, an antibiotic sub- 
stance derived from Str. lactis (Mattick and Hirsch, 1947). A strain 
of the acid-forming bacteria, Sér. agalactiz is grown in the presence of 
a sub-lethal quantity of the antibiotic under controlled conditions, and 
it is found that the amount of acid produced after about 12 hours is 
related to the quantity of nisin. The exact form of the relationship is 
complicated, but a useful linear range can be obtained by plotting pH 
against log. dose, and this fact has been used in the assay under con- 
sideration. A standard preparation containing 50 units/ml. of nisin 
was used as dose S, , and diluted twice in the ratio 1.5 : 1 to give doses 
S, and S, . Similarly, the unknown preparation and two successive 
dilutions from it were used for doses U, , U, and U,. The responses 
(which are given in terms of 100 X (pH — 5)), are combined as de- 
scribed above to give D/B = + 0.306, D/R = + 1.73. The assay is 
shown to be valid by comparing the values of P, H and K with the 
limits of ¢,R and ¢,R, and the use of the nomogram gives the relative 
potency as 1.180, with 5% limits of 1.087 to 1.294. Thus, as the stand- 
ard contains 50 units of nisin per ml., the unknown is estimated to 
contain 59.0 units/ml. with 5% limits of 54.4 to 64.7 units/ml. I am 
indebted to Dr. A. Hirsch of the National Institute for Research in 
Dairying for permission to use these figures. 


CONSTRUCTION OF THE NOMOGRAM 


The ordinary theory of biological assay (see, for example Emmens, 
1948) shows that the estimate of relative potency is given by 
log I X 4D/3B. If we write a = +/2D/~+/3B, it can be shown that the 
fiducial limits of a are given by the roots of the quadratic equation 


(+/2D — +/3Ba)? = 12nt’s*(1 + a’) 


where s’ is the error variance of a single response and ¢ is the appropriate 
deviate of the ordinary Student distribution. In the present applica- 
tion, ¢ is replaced by an analogous quantity which allows for the re- 
placement of s by the sample range (Lord, 1947). The limits of the 
estimate of relative potency are then obtained by a suitable multiplying 
factor. 
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The nomogram in Fig. 1 is a graphical representation of these rela- 
tions, and will be described in terms of rectangular coordinates x and 
y. The axes are drawn first, and suitable units of z and y chosen—in 
practice the unit of y may conveniently be ten times the unit of z. 
On the z-axis, a linear scale BB’ is marked off, each unit of which has 


a length depending on n and the level of significance used. This unit 
is tabulated below. 


TABLE 3 
LENGTH OF UNIT DIVISION ON SCALE BB’ 
n 2 | 3 4 | 5 6 7 8 9 10 
5% | 0.821 | 1.119 | 1.212 | 1.243 | 1.248 | 1.240 | 1.227 | 1.206 | 1.192 
1% | 0.537 | 0.790 | 0.882 0.913 | 0.925 | 0.921 | 0.913 | 0.905 | 0.893 


Two scales are marked on the y-axis; the first of these is another linear 
scale AA’, each unit having a length of +/(2/3) = 0.8165. The second 
scale OO’ is logarithmic, and the position of the graduation corre- 
sponding to each potency ratio can be found from the relation y = 
0.6124 x log potency ratio / log J. The two curves PP’, QQ’ are 
given by the equation z* = 1 + y’, and the scales on them are graduated 
by the same logarithmic relation. The example shown in Fig. 1 is for 
use when n = 3, J = 1.5 and the 5% limits of error are required. 
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QUERIES 


QUERY: In the course of running a series of experiments, I 

73 have encountered a problem which is probably simple to solve 
but which I am incapable of solving. The problem is this: 

Given two distributions with sample means z and y, and corre- 

sponding estimated standard errors, s; and s,, what is the reliability 

of the observed ratio, z/y, and how does one go about the setting of 

confidence limits? 
It seems to me that the problem I have stated is one of great practical 
significance—it would not otherwise have been called to your attention. 


Your problem is certainly of practical importance. In bio- 
ANSWER: logical assay, for example, chief interest lies usually in the 

ratio of two weighted means of responses, and not in 
differences of means. If we write the ratio of your two unweighted 
means as 


18! 


the method commonly used is to take 
V(m) = {V(z) + (1) 


and to calculate confidence limits with the aid of this variance formula. 
This must be condemned, since it seriously overestimates the precision 
of m except when 7 is very much larger than its standard error; only 
if y’ is at least 40 V(y) for 95% limits, or 70 V(y) for 99% limits, can 
equation (1) be safely used. 

A better procedure is based upon a theorem first stated by Fieller 
in 1940 (Journal of the Royal Statistical Society, Supplement, Vol. 7, 
pp. 1-64). Suppose that x, y are means of n, , m2 observations respec- 
tively, from distributions that may be assumed normal. Suppose 
further that the two distributions have a common variance, which ‘is 
estimated by a mean square, s’, with f degrees of freedom (f may be 
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nm, + m. — 2, but can differ from this if the means Zz, y are from an 
analysis of variance and not just from two simple samples). Then, for 
any value of a quantity », ( — wy) is normally distributed, 


42) 
V@& — wy) “its + 
and therefore 


wy 
t = 
{1 
Ny Ne 


follows the ¢-distribution with f degrees of freedom. If we now de- 
termine u by the condition that ¢ shall be the tabular entry for an agreed 
percentage point, the two values of » must be the lowest and the highest 
which are not significantly in conflict with 


E(z) — = 0; 


thus they are fiducial limits to m. The equation for these limits is 


a quadratic, and the two roots, mz, , my , may be found directly by the 
formula 


st (1 — 
y Ne 
where 
st 
= 
of the alternative signs in (3), “‘“—”’ will give the lower, ‘“+” the upper 


fiducial limit. Note that when g is negligibly small the formula is the 
same as that based upon (1), but when g is, say, greater than 0.1, 
equation (1) may be seriously misleading. The limits given by (3) are 
true fiducial limits to the ratio; I understand that doubts have been 
expressed as to whether they are also confidence limits. 

Fieller’s theorem will still apply when z, y are means of n correlated 
variables, or when they are weighted means, either correlated or not; 
weighted means include, for example, regression coefficients on another 
variate. Fisher gives the method for correlated means in Section 62.1 
of his Design of Experiments, and a generalization of (3) which includes 
many cases appears as formula (4.7) in my Probit Analysis. You may 
wish to know what happens if the variances in the populations from 
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which z, y come are not equal. Separate mean squares, s; and 83 , 
would then have to be used, and equation (2) would be modified so 
as to replace t by a deviate from the Fisher-Behrens distribution (see 
Fisher and Yates’s Statistical Tables, Table V,). An explicit formula 
such as (3) cannot be given, and the limits must now be obtained by 
using a method of successive approximation to solve equation (2) for p. 


D. J. FInNney 


QUERY: We would very much appreciate your advice regard- 
74 ing application of the t-test to accumulated chi-square values for 
a time series investigation. 

The samples are frozen peas stored at — 10°F and 0°F for periods of 
4to40 weeks. Judges evaluated their quality at about 4 week intervals. 
The —10°F and 0°F samples and a duplicate of one of these samples 
were submitted each judging period (about 12 judges, 5 replicate judging 
periods for each storage period). Judges were asked (1) to indicate 
whether there was any difference between samples, (2) if there was a 
difference to check duplicates, and (3) if there was a difference in flavor, 
texture and color to indicate which sample or samples were best. Data 
were analyzed by chi-square. In the case of best flavor, texture, color 
data only the results for those who identified duplicates were included 
in the chi-square analyses. 

The results for identification of duplicates seem inconsistent. Chi- 
square values were significant at 4, 14, 18, 20, 32, 36 and 40 weeks, 
almost significant at 24 weeks but definitely not significant at 8 and 28 
weeks (see condensed summary in Table 1 below). 

The ¢-test applied to chi-square values accumulated up to each 
storage period has been suggested for these data: 


/N 
R. A. Fisher recommended this test to M. P. Masure of this laboratory 
in correspondence in 1931 for data similar to ours. Masure used the 
test in his publication, Effect of Ultraviolet Radiation on Growth and 


Respiration Pea Seeds, with Notes on Statistics, The Botanical Gazette 
93, 21-41, 1932. 


The chi-square formula we used was 


(| observed-expected | — 0.5)’ | 


expected 
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Expected, of course, was 1/3 total n for identified and 2/3 total n for 
failed to identify. We permitted judges to say that there was no differ- 
ence between the 3 samples and added 1/3 “no difference’’ judgements 
to identified and 2/3 to failed to identify (judges who said there was a 
difference but checked the wrong samples as duplicates). Some in- 
vestigators insist that judges always choose two samples as duplicates. 
We think it psychologically wrong to force a choice when the judge has 
done his best and still cannot detect any difference between samples. 
If there is any statistical advantage for forcing a choice or permitting 
“no difference” answers, we think the advantage in favor of the latter 
because the number who indicate no difference is small and might not 
distribute normally. 


Storage Number of Identifications No 
Period Wks. Judgements Difference 
Correct Incorrect 

4 51 21 16 14 

8 52 12 28 12 

12 75 30 28 17 

16 68 39 23 6 
20 68 34 29 5 
24 61 26 30 5 
28 68 22 38 8 
32 76 42 30 4 
36 68 40 28 0 
40 60 36 24 0 


If we have made a suitable application of the t-test, we think we 
can conclude that the difference between samples is significant at 18 
weeks. Our queries are: have we applied a proper statistic and is our 
conclusion correct for these samples? Also, do our data show anything 
else of statistical significance? Do you know of a literature reference 
to the application of the t-test as discussed here? Do you suggest some 
other treatment of our data? 


We must be careful to separate two questions:—‘‘How 

ANSWER: soon have I accumulated evidence that there is a real 

difference at a given level of significance?’’ and ‘How does 

the difference seem to change with time?” There are points worthy of 
careful attention in both cases. 

The process of continually trying combining the present trial with 
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those already accumulated, testing the significance of all the data to 
date, and continuing until “significance” is reached can be dangerous. 
If really carried out in this way, it is sure to reach “significance” no 
matter what the true state of affairs. You are exempt from the worst 
features of this difficulty, because you have previous knowledge that 
the two storage temperatures will really be different after a long enough 
time. (However, others might find real difficulty here.) 

The method outlined by Masure is surely sound, provided, as he 
carefully stated, that all the deviations involved in the chi-squares were 
in the same direction. However, when you are calculating chi-squares 
to be combined, you should not make the 0.5 correction for continuity, 
as Cochran (Iowa State College Jour. Sci. 16:421-436, 1942) has shown. 

A simpler way to analyze data such as yours, where the null hy- 
pothesis of pure chance gives a fixed chance of one-third of identifying 
correctly, is the following: Accumulate the number of correct identifi- 
cations, accumulate separately the number of incorrect identifications, 
and test to see if the ratio is significantly different from 1 to 2. Thus, 
after 20 weeks, you have a total of 136 correct and 124 incorrect (and 
54 “no difference’ which we discard for the present). We can test this 
against the 1 to 2 ratio using 


(i) chi-square (where we do make the 0.5 correction) 


2 _ (136 — 86.7 — 0.5)” | (124 — 173.3 + 0.5) _ 
x 86.7 * 173.3 “a 


which is very significant on one degree of freedom, 

(ii) a simple graphical method (see Frederick Mosteller and John W. 
Tukey, ‘The uses and usefulness of binomial probability paper”, 
Jour. Amer. Stat. Assn., 1949, Example 1) 


(iii) the simple formula for an approximate normal deviate corre- 
sponding to the use of binomial probability paper 


136(2) +2) = 6.13, 


where we have multiplied each observed number by the other theo- 
retical probability and have increased that observed number by 
unity which reduces the difference of the square roots. A normal 
deviate of 6.13 is also very significant. This last formula gives as 
accurate results as the corrected chi-square and is very convenient 
with a slide rule. 


You included the judgements of “‘no difference” in your analysis. For 
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the purpose of judging significance alone, there seems to be no reason 
to include them, and several to leave them out. But if you want an 
indication of the magnitude of the differences between storage tempera- 
tures, as measured by the ability of these judges to detect differences cor- 
rectly, then these judgements of “no difference” are important, and 
should come in. It seems natural to score as follows: 


correct identification +1 
incorrect identification —1/2 
no difference 0 


and then to take the average score, which we may express as a per- 
centage for convenience, as a measure of difference. Arithmetically, this 
simplifies to 
(correct) — 1/2(incorrect) 
(total judgements) 


and should have a variance due to sampling of not more than 


1 
2(total judgements) 


Your ten trials score 25%, —4%, 21%, 40%, 29%, 18%, 4%, 35%, 
38%, 40%. The greatest standard deviation expected from sampling is 
about 


= .088, 
V2(65) 


or about 8.8%, and a value somewhat smaller than this is reasonable. 
With the exception of the values at 8 and 28 weeks, your results seem 
to be consistent with a true score of about 30, not changing with time. 
Could you make a test after one day’s storage? 


Joun W. TuKEY 
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THE BIOMETRIC SOCIETY 


Benelux Meeting. Following some admirable preliminary work by 
our Council member, Dr. Nuerdenburg, the State University at Utrecht 
was host for a program of papers on biometry this last July 2nd. Dr. 
W. A. Mijsberg, Professor of anatomy and embryology at Utrecht, 
discussed frequency curves of stature from 1811 to the present time with 
particular relation to the effect of economic status upon the ability to 
reach full genotypical height. Director P. deWolff of the Municipal 
Statistical Bureau of Amsterdam discussed statistical problems con- 
cerned in the assay of vitamin D, in chickens. Mr. J. A. Enters, an 
industrial statistician, reported on the use of measurements of Dutch 
men and women for reducing fitting costs in the clothing industry. Dr. 
E. van der Laan of the Agricultural State College at Wageningen sur- 
veyed the different types of experimental design which are now avail- 
able. Dr. G. A. Gussenhoven outlined his results in combining factors 
recorded in the case histories of patients with pulmonary tuberculosis. 
Dr. R. A. M. Bergman, Professor of the Medical Faculty of Batavia, 
described his researches on the linear growth of snakes. Dr. J. ten 
Doesschate reported on some quantitative aspects of the gerontology of 
the eye. 

About 30 scientists attended the meeting, including one from Bel- 
gium, and at the close of the all-day session the audience voted for a 
second meeting. Another result has been the material increase in the 
membership of the Society in Holland. Dr. Neurdenburg is to be con- 
gratulated on so auspicious a beginning toward the development of a 
Benelux Region. 

The Second International Biometric Conference. The Second Inter- 
national Biometric Conference convened at the University of Geneva 
in Switzerland on August 30, and continued for four days with a total 
registration of 102. The Governments of Belgium, Great Britain, Greece, 
Italy, Netherlands, Portugal, Spain and Venezuela named official dele- 
gates. Thirteen others attended as delegates of Academies of Science, 
international organizations, municipalities, Regions of the Biometric 
Society or other organizations. Some 19 countries were represented in 
all, Great Britain leading with 18, and followed by Switzerland with 14, 
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France 10, Italy 10, Netherlands 10, Denmark 9, United States 9, 
Belgium 5, India 3, Portugal 3, Argentina 2, Australia 2, and Canada, 
Finland, Greece, Mexico, Spain, Sweden and Venezuela with one each. 

The Conference opened with a welcoming address by Professor G. 
Tiercy, Rector of the University of Geneva, and by Professor A. 
Franceschetti of the Faculty of Medicine, who spoke successively in 
French, English and Italian, and continued with a business meeting of 
the Society. The afternoon session on experimental design with Dr. 
Yates in the chair, featured papers by Professors Cox and Quenouille, 
with discussions by Drs. Hald, Astrand, Rasch, Bernstein, Schutzen- 
berger, Fisher, Healy and Bartlett, and a concluding summary by the 
chairman. At the end of the session Professor and Mrs. Franceschetti 
entertained members of the Conference with a delicious high tea at 
their country home on the shores of Lake Geneva where we were also 
rewarded with a splendid view of Mt. Blanc in the setting sun. 

The morning session on August 31 concerned the recent applications 
of biometrical methods in genetics under the chairmanship of Professor 
Fisher. The papers by Drs. Yates, Cavalli and Finney were discussed 
individually, both by the speakers and in addition by Profs. Cochran, 
Pompilj, Chodat, Haldane, Bernstein and Healy. The afternoon session 
on biometrical aspects of biological assay under the chairmanship of 
Dr. Finney offered papers by Drs. Irwin and Perry and a lively discussion 
by Drs. Bliss, Jerne, Tripod, Fieller, Hartley, Bernstein, Rasch, Martin 
and the chairman. 

On September 1 the morning session concerned the present status 
of biometry with Professor Darmois in the chair and a remarkably lucid 
paper by Prof. Cochran which was discussed by Drs. Hopkins, Mahal- 
anobis, Gini, Haldane, Rasch, Kemp and Rapaport. The afternoon 
session on industrial applications of biometry was chaired by Dr. 
Astrand and had as its principal speaker Dr. Davies with discussions by 
Miss Day, Drs. Fieller and Hald. That evening the members of the 
Biometric Conference and of the International Union for the Study of 
Populations, which was meeting concurrently in Geneva, were enter- 
tained at a reception by the State Council of the Canton of Geneva and 
by the Municipal Council of the Town of Geneva at Palais Eynard, a 
most colorful affair. 

The session of the morning of September 2 under the chairmanship 
of Prof. Mahalanobis considered teaching and education in biometry. 
The principal paper by Professor Bartlett was followed by an active 
discussion which included Profs. Darmois, Cochran, Cox, Gini, Roy and 
Vessereau, also Drs. Finney, Bliss, Yates, van der Laan and Martin. 
At the conclusion of the session it was resolved that teaching material 
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on Biometry be assembled by the Society for distribution to members 
and others who are interested. The final scientific session on the after- 
noon of September 2 under the chairmanship of Dr. Buzzati-Traverso 
consisted of four contributed papers read by Drs. Rapaport, Boeri, 
Schwartz and Nass. A final business meeting concluded the Conference. 

Most of the principal papers had been mimeographed in advance so 
that copies were available for those attending. English and French 
were official languages and all discussions were translated most com- 
petently from one language into the other. A photographer was active 
during the conference and the morning session on August 31 was in- 
terrupted for a group photograph on the steps of the University building. 
It is planned to publish the Proceedings of the Conference as completely 
as possible in BIOMETRICS during the coming year with the aid of 
a grant of $800. from UNESCO. It is hoped that reprints will be 
available of any papers which are published elsewhere for distribution 
to members of the Society. 

Two Council meetings were held during the Conference on the 
evenings of August 29 and September 1. They were attended by Miss 
Cox and Messrs. Hopkins, Mahalanobis, Schwartz, Neurdenburg, Fisher, 
Cochran, Bliss, Buzzati-Traverso, Linder, Finney and Yates. The 
meetings were concerned primarily with problems arising in the Con- 
ference or which will appear in later issues of BIOMETRICS. 

All members of the International Statistical Institute, with which 
we are affiliated, were invited to participate in the 2nd International 
Biometric Conference. Similarly, all those attending the Conference in 
Geneva were invited to attend the meetings of the International Sta- 
tistical Institute in Berne during the following week. Many of those 
attending both conferences took advantage of a special train which left 
Geneva at 8:30 AM September 3 for Berne via Sion, Lotschberg, Inter- 
laken, Thun and Lucerne. The weather conditions were perfect and all 
enjoyed some magnificent views of Swiss mountain scenery. 

The success of the Conference was due in large part to the excellent 
work of Professor Arthur Linder who served as Secretary of the Con- 
ference Committee. Those of us who had the good fortune to attend 
the meetings will long remember the many courtesies of Professor 
Linder and his aides. 

ISI meetings in Berne. The 26th Session of the International Sta- 
tistical Institute convened at the University in Berne on September 5. 
Fifteen of the papers on the ISI program were by members of the 
Biometric Society and lay in the fields of statistical sampling, industrial 
applications of statistical methods, statistical education, recent de- 
velopments in statistics and demography. These sessions enjoyed the 
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same good weather as those in Geneva. One meeting of the Bureau of 
the ISI was attended by representatives of affiliated organizations in- 
cluding the Biometric Society, and considered how we can obtain the 
greatest advantage from our affiliations. 

An International Statistical Seminar under the auspices of the ISI 
was held during the two weeks following the meetings at Berne. The 
first week at the University in Berne on September 12-17 included 
several lectures on experimental design and industrial applications by 
members of the Biometric Society including Professors Cox, Bliss, 
Quenouille, Linder and Day. For the second week the sessions were 
shifted to Geneva and emphasized statistical sampling with Professors 
Darmois, Deming, Linden, Madhava, Mahalanobis and Yates among 
the lecturers. 

Project on Training in Biometry. The session on teaching and edu- 
cation of biometry at Geneva revealed the need for a wide exchange of 
information on the material covered by courses in this field. Syllabi 
are wanted showing the time spent on each topic in courses on bio- 
metry or including biometry in different universities of the world. 
The Society has been asked to assemble such information and make it 
available to teachers in the field. The subject is also of interest to 
UNESCO and certain of its affiliated agencies. Plans are now being 
made to assemble this information and progress will be announced in 
later issues of BIOMETRICS. A committee consisting of Professors 
W. G. Cochran (Chairman), C. I. Bliss, A. Buzzati-Traverso, G. Darmois 


and K. Mather has been named by President Fisher to undertake this 
project. 
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NEWS AND NOTES 


BIOMETRIC SECTION OF THE AMERICAN STATISTICAL AS- 
SOCIATION ANNUAL MEETING, DECEMBER 28-30, BILTMORE 
HOTEL, NEW YORK CITY: Sessions arranged by the Biometric Sec- 
tion. Joining organization: Biometric Society. 

Wednesday, December 28, 4-6. 
Topic: The use of rationally developed equations in biology. Chair- 
man: Horace W. Norton. 
Papers: An interpretation of the formation of active bacterial virus 
from ultraviolet inactivated virus. S. E. Luria. The application of 
equations derived from models, to “central” circulatory volume. 
Elliot V. Newman and Margaret Merrell. 
Discussants: Joseph Berkson and L. J. Savage. 


Thursday, December 29, 2-4. 
Topic: Long-time follow-up in morbidity studies. Chairman: John 
W. Fertig. 
Papers: The definition of the group to be followed. Paul M. Densen. 
Timing of the distribution of the events between observations. 
T. E. Harris, Paul Meier and John W. Tukey. Methods of analysis 
in follow-up studies. Harold F. Dorn. 


Discussants: Hugo Muench, Rowland Rider and Mortimer Spiegel- 
man. 


Friday, December 30, 10-12. 


Topic: Contributed papers. Chairman: Frederick Mosteller. 
Papers: Relative precision of minimum x” and maximum likelihood 
estimates of regression coefficients, with particular reference to bio- 
assay. Joseph Berkson. Malformations at the Boston Lying-in 
Hospital, 1930 to 1941. Jane Worcester and Stuart S. Stevenson. 
A statistic for rating diagnostic tests. W. J. Youden. 
Discussants: Paul Bruyere, Chester I. Bliss and John Tukey. 


NORTH CAROLINA INSTITUTE OF STATISTICS TASTE TEST- 
ING CONFERENCE—On November 7-10, the University of North 
Carolina Institute of Statistics held another in a series of statistical work 
conferences under the sponsorship of the General Education Board, this 
time in the field of taste testing. About 25 leaders in the field were 
present. The program included various aspects of organizing and con- 
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ducting taste tests with particular emphasis upon statistical procedures 
available for this type of experimentation. Some of the subjects dis- 
cussed were: ‘Fundamentals of Flavor Characterization,” by E. C. 
Crocker of Arthur D. Little, Inc., Cambridge, Massachusetts; ‘‘ Layout 
and Design of Flavor-Preference Panels,’’ by W. Franklin Dove, Food 
Consultant, Oak Park, Illinois; “Scoring Systems,” by J. W. Hopkins, 
Division of Applied Biology, National Research Laboratories, Ottawa, 
Ontario, Canada; techniques of laboratory testing; food surveys; the 
replacement of organoleptic tests by physical-chemical methods; and 
designs which are appropriate for tasting experiment. 


AUSTRIA—The following quotation was taken from a letter from 
W. Winkler, Director of the Institute of Statistics of the University of 
Vienna with whom we have recently agreed to exchange Biometrics for 
Statistische Vierteljahresschrift. ‘Our Institute of Statistics corresponds 
rather to the Department of Statistics in your universities or still to 
less. The hitherto regulations about statistical teaching are rather 
poor, but on the way to being replaced by more sufficient ones. Also 
the introduction of courses for ‘professional statisticians’ is in preparation 
with the aim of getting the title of a ‘diploma statistician’. As all that 
is only on the way and the present state of things poor, I renounce de- 
scribing it and request you kindly to have patience till I am in a position 
to write of reforms already performed.” 


CZECHOSLOVAKIA—From the School of Agriculture and Forestry, 
Czech Technical University, Praha, Czechoslovakia, Vaclav Myslivec 
writes, “‘At our Czech Technical University, faculty of agriculture and 
forestry, I am giving a couple of courses. These courses are: Elements 
of higher mathematics, Biometrics and Experimental Statistics. The 
second course is given for the students of forestry and the third for 
students of agriculture. In the second course I am giving fundaments 
of sampling theory, which is very important for forestry mensuration 
and taxation. In the third course, which is highly important for agri- 
cultural experimentation, I am giving lectures on fundaments of mathe- 
matical statistics, and most of the time is given to analyses of variance 
and to design of laboratory and field experiments. I have found a lot 
of interest among my students and particularly among research workers 
not only in the school but also in the experimental stations. The first 
course, concerning calculus, is a preparatory course for my second and 
third course. I really enjoy my work in this field of science. The main 
reason for my joy is that I found a rather big audience and secondly 
that I am bringing to my pupils quite a new and important knowledge. 
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This knowledge is of fundamental importance for all research work in 
agriculture and forestry.” 


HAWAII—O. E. Sette of the United States Fish and Wildlife Service 
informs us that his recent move from San Francisco to Honolulu was 
in connection with the establishment of a new activity within the Fish 
and Wildlife service to be known as the Pacific Oceanic Fishery In- 
vestigations of which Mr. Sette is Director. Milner Baily Schaefer, 
formerly with the South Pacific Investigations of the United States 
Fish and Wildlife Service with headquarters at Stanford University, 
California, has been transferred to the Pacific Oceanic Fishery Investiga- 
tions. The staff is now occupying temporary quarters furnished by the 
Navy but expects to be able, sometime after the first of January, 1950, to 
move into a new laboratory being constructed on the campus of the 
University of Hawaii. These Investigations are engaged in studies 
toward the development of the now unutilized high seas fisheries of the 
Pacific oceanic areas embracing the region to the southward of the 
Hawaiian Islands, and the areas to the westward which were formerly 
under a Japanese mandate but now constitute the Trust Territories of 
the Pacific. The Section of Biology and Oceanography of which Mr. 
Schaefer is chief, is engaged in studies of the biology, ecology and dis- 
tribution of the tunas and other pelagic fishes and the relationship thereof 
to the various factors of the oceanic environment such as currents, 
temperature distributions, and other physical and chemical factors. 


UNITED STATES—For having written and published an article 
entitled ‘Casualties of the United States Eighth Air Force in World 
War IT”, James A. Rafferty, chief of the Department of Biometrics at 
the U.S. Air Force School of Aviation Medicine, Randolph Air Force 
Base, has received commendations from Brig. Gen. Otis O. Benson, Jr., 
Commandant of the School of Aviation Medicine, and Col. George F. 
Baier III, surgeon of the Air University at Maxwell Air Force Base, Ala. 
The report was startling in that it revealed that about one half of the 
casualties reported were of non-battle types. ... Fred A. Schultz, Director 
of Pharmaceutical Research, Commercial Solvents Corporation, Terre 
Haute, Indiana, informs us that his interest is concerned with the 
statistical evaluation of data obtained in animal experimentation. ‘At 
the present time we are carrying out a large number of experiments for 
the evaluation of compounds for their possible use as therapeutic agents. 
Needless to say, in the evaluation of our animal results a statistical 
analysis of the data is essential.” . . . At Iowa State, Gerhard Tintner 
has returned to the staff of the Statistical Laboratory and the Economics 
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Department after a year’s leave of absence to work at Cambridge Uni- 
versity in the Department of Applied Economics; Robert G. D. Steel 
has taken a position with the College of Agriculture at the University of 
Wisconsin, after receiving his Ph.D. at Ames in June. Mr. Steel’s 
dissertation was, “‘ Minimum Generalized Variance for a Group of Linear 
Functions”; Bernard Ostle received his Ph.D. in Statistics at the end of 
the summer session with a dissertation “On Certain Criteria for Optimum 
Estimation.” He remains at Iowa State as Assistant Professor in the 
Department of Statistics; Osmer Carpenter received his Ph.D. in June 
and returned to his position with the Atomic Energy Commission at 
Oak Ridge, Tennessee. His dissertation was, “Sequential Tests of the 
Linear Hypothesis”; Douglas Robson obtained his B.S. in Statistics in 
June and is now working with Walter Federer in the Department of 
Plant Breeding at Cornell University; joining the ranks of the thirty- 
odd graduate students majoring in Statistics in the Department are 
Om Prakash Aggarwal from Delhi University, India, and Bertil Matérn, 
a student of Cramér, from the Forest Research Institute, Sweden. . . . 
E. L. Cox formerly at Virginia Polytechnic Institute and who has just 
spent a summer in Eastern Canada chasing fishes for the Atlantic Salmon 
Investigation, is now with the Institute of Statistics working on the 
application of statistics to fishery research problems. He is, to quote Mr. 
Cox, “figuring how to catch more fishes by putting statistics instead of 
salt on their tails.” Also with the Institute are Dan Teicherow, from the 
University of Toronto who is working on a doctorate in Experimental 
Statistics, and A. Grandage from Schering Corporation, Bloomfield, New 
Jersey, whose major interest is in the statistical aspects of biological 
assay... . From the Department of Mathematics, Statistical Labora- 
tory at the University of California, Berkeley, we have a report on 
recent changes of status. Henry B. Mann of Ohio State University has 
accepted a Visiting Professorship and Research Associateship for the 
academic year 1949-1950; J. Neyman, Director, will be on sabbatical 
leave for the Spring Semester, 1950; Joseph L. Hodges, Jr. has been 
promoted to Assistant Professor and Research Associate; Charles M. 
Stein, Assistant Professor and Research Associate, will be on leave for 
the academic year 1949-1950, and will be working in Paris as a National 
Research Fellow; Douglas G. Chapman, Mark W. Eudey, Elizabeth L. 
Scott and Ester Seiden obtained their Ph.D. degrees in Statistics at the 
University of California. Douglas G. Chapman has accepted an Assistant 
Professorship at the University of Washington, Seattle. Mark W. Eudey 
is now Vice-President of California Municipal Statistics. Elizabeth L. 
Scott and Ester Seiden have been promoted to Lecturer and Research 
Associate at the Statistical Laboratory. 
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